Skip to content

feat: Execution Monitor View with real-time SSE streaming (#332)#339

Merged
frankbria merged 12 commits into
mainfrom
feature/332-execution-monitor-view
Feb 6, 2026
Merged

feat: Execution Monitor View with real-time SSE streaming (#332)#339
frankbria merged 12 commits into
mainfrom
feature/332-execution-monitor-view

Conversation

@frankbria

@frankbria frankbria commented Feb 6, 2026

Copy link
Copy Markdown
Owner

Summary

  • Real-time execution monitoring via SSE at /execution/[taskId] with auto-scrolling event stream, progress bar, agent state badges, and file change sidebar
  • Batch execution monitoring at /execution?batch=<id> with accordion task rows, per-task live streaming, and batch-level stop/cancel controls
  • Bridges navigation gap from [Phase 3] Task Board View - Kanban & Batch Execution #331: Task Board execute buttons now navigate to the Execution Monitor
  • 53 new tests (253 total passing) covering the hook, event rendering, blocker form submission, and style utilities

New files (19)

  • 2 Next.js pages: single-task /execution/[taskId] + landing /execution
  • 11 execution components: EventItem, PlanningEvent, FileChangeEvent, ShellCommandEvent, VerificationEvent, BlockerEvent, ExecutionHeader, ProgressIndicator, EventStream, ChangesSidebar, BatchExecutionMonitor
  • 1 hook: useExecutionMonitor (rAF-batched state accumulation over useTaskStream)
  • 1 utility: eventStyles.ts (agent state derivation, badge styles, icons)
  • 3 Shadcn UI primitives: alert-dialog, progress, scroll-area
  • 1 barrel export

Modified files (5)

  • TaskBoardView.tsx: execute handlers now navigate to /execution
  • AppSidebar.tsx: Execution nav link enabled
  • api.ts: added blockersApi, batchesApi, tasksApi.stopExecution/resumeExecution
  • types/index.ts: added Blocker, BatchResponse, UIAgentState types
  • @hugeicons/react mock: added execution monitor icons

Code review findings addressed

  1. await onStop() so loading spinner is visible
  2. Removed nested ScrollArea to fix scroll detection
  3. Stable polling via ref-based task dedup (no stale closure)
  4. try/catch on batch stop/cancel handlers
  5. blocked completion → BLOCKED state (not FAILED)
  6. aria-label on blocker textarea
  7. aria-expanded on collapsible sidebar toggle
  8. aria-expanded on batch accordion rows

Test plan

  • npx tsc --noEmit passes (only pre-existing test file errors)
  • npx next build succeeds (all pages generate)
  • npx jest — 253 tests passing, 24 suites
  • Manual: Start single task from Task Board → navigates to /execution/[taskId]
  • Manual: SSE events stream in real-time with progress bar
  • Manual: Blocker interrupt shows inline answer form
  • Manual: Batch execute → /execution?batch=<id> with accordion rows
  • Manual: Stop button shows confirmation dialog

Closes #332

Summary by CodeRabbit

  • New Features

    • Full execution monitor UI: live event stream with auto-scroll/new-events control, progress indicator, connection status, header with Stop, task and batch pages, batch monitor, changes sidebar, and inline blocker answering.
  • API

    • UI-accessible blocker and batch endpoints plus task stop/resume controls and structured event streaming.
  • Tests

    • Extensive new tests covering execution UI, hooks, event styles, and streaming utilities.
  • Chores

    • Added execution-related icon mocks.
  • Dependencies

    • Added Radix UI packages for dialog, progress, and scroll-area.

Test User added 3 commits February 6, 2026 09:32
#332)

Implements Steps 1-12 of the execution monitor plan:

- Bridge navigation gap: Task Board execute buttons now navigate to /execution
- Add Blocker, BatchResponse, and UIAgentState types
- Extend API client with blockersApi, batchesApi, and execution controls
- Create event styling utilities with agent state derivation
- Build 6 event display components (planning, file change, shell, verification, blocker, base item)
- Build 4 container components (header, progress, event stream, changes sidebar)
- Add 3 Shadcn UI primitives (alert-dialog, progress, scroll-area)
- Create useExecutionMonitor hook with rAF-batched state accumulation
- Create single-task execution page at /execution/[taskId]
- Create execution landing page with batch/task routing
- Create BatchExecutionMonitor with accordion task rows
- Enable Execution nav link in sidebar
- Add barrel export for all execution components
- useExecutionMonitor hook: 12 tests covering state derivation,
  event accumulation, progress tracking, completion detection,
  heartbeat filtering, and taskId reset
- EventItem component: 12 tests covering delegation to specialized
  components for each event type (planning, file change, shell,
  verification, blocker, completion, error)
- BlockerEvent component: 7 tests covering form rendering, submit
  behavior, API call, error display, and whitespace trimming
- eventStyles utilities: 14 tests covering deriveAgentState for all
  event types and completeness of style/icon/label maps
- Update AppSidebar test to reflect Execution nav now enabled
- Add execution monitor icons to Hugeicons mock

252 tests passing, 24 suites.
1. ExecutionHeader: await onStop() so loading spinner is visible
2. EventStream: remove ScrollArea wrapper to fix scroll detection
   (nested overflow containers broke onScroll + scrollIntoView)
3. BatchExecutionMonitor: use ref instead of state for task ID
   dedup to prevent polling interval reset on each task load
4. BatchExecutionMonitor: add try/catch to stop/cancel handlers
5. eventStyles: map blocked completion to BLOCKED, not FAILED
6. BlockerEvent: add aria-label to textarea
7. ChangesSidebar: add aria-expanded to collapse button
8. BatchExecutionMonitor: add aria-expanded to accordion rows
@coderabbitai

coderabbitai Bot commented Feb 6, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Adds a client-side Execution Monitor with SSE streaming, new hooks (useTaskStream opt, useExecutionMonitor), a suite of execution UI components/pages (including blocker answer flow, event stream with auto-scroll, progress indicator, changes sidebar, and batch monitor), backend streaming/task changes, API additions for blockers/batches/stop/resume, Radix UI primitives, icon mock additions, and many tests.

Changes

Cohort / File(s) Summary
Icon Mocks
web-ui/__mocks__/@hugeicons/react.js
Appended 8 mocked icons for execution UI (Idea01Icon, ArrowTurnBackwardIcon, CommandLineIcon, AlertDiamondIcon, WifiDisconnected01Icon, SidebarLeftIcon, ArrowDown01Icon, StopIcon).
Types
web-ui/src/types/index.ts
Added blocker and batch types and UIAgentState (Blocker, BlockerListResponse, BatchResponse, BlockerStatus, UIAgentState).
API Layer
web-ui/src/lib/api.ts
Added blockersApi and batchesApi; added tasksApi.stopExecution and tasksApi.resumeExecution; updated structured error formatting.
Hooks
web-ui/src/hooks/useTaskStream.ts, web-ui/src/hooks/useExecutionMonitor.ts
useTaskStream gained workspacePath and explicit close handling; useExecutionMonitor added to batch SSE events via rAF and derive monitor state (agentState, steps, message, completion, changedFiles) with close().
Execution Components
web-ui/src/components/execution/..., web-ui/src/components/execution/index.ts
New execution UI: EventItem, PlanningEvent, FileChangeEvent, ShellCommandEvent, VerificationEvent, BlockerEvent (answer flow), EventStream (auto-scroll/new-events), ExecutionHeader, ProgressIndicator, ChangesSidebar, BatchExecutionMonitor; re-exported via index.
Pages & Routing
web-ui/src/app/execution/page.tsx, web-ui/src/app/execution/[taskId]/page.tsx
Added execution landing and task detail pages: workspace hydration, SSE hook wiring, stop/resume handlers, completion banners, batch/task routing and redirects.
UI Primitives (Radix)
web-ui/src/components/ui/..., web-ui/package.json
Added Radix-based AlertDialog, Progress, and ScrollArea components and added @radix-ui/* dependencies.
Layout & Navigation
web-ui/src/components/layout/AppSidebar.tsx, web-ui/src/components/tasks/TaskBoardView.tsx
Enabled Execution nav item; TaskBoardView navigation now routes to /execution/{taskId} or batch view after start.
Styling & Utilities
web-ui/src/lib/eventStyles.ts
New central mappings and deriveAgentState to map backend events → UIAgentState, badge styles, labels, icons, and SSE connection dot styles.
Backend SSE & Tasks
codeframe/ui/routers/streaming_v2.py, codeframe/ui/routers/tasks_v2.py
Refactored SSE utilities: streaming_v2 now provides queue-based subscription utilities with shorter heartbeat; tasks_v2 adds structured event stream endpoint (/stream), raw output endpoint (/output), background task execution, and publisher wiring.
Tests
web-ui/__tests__/**, tests/ui/**
Added/updated extensive tests: useExecutionMonitor, eventStyles, EventItem, BlockerEvent, useTaskStream signature updates, AppSidebar expectations, and server-side streaming router/integration tests.
Env
web-ui/.env.example
Added NEXT_PUBLIC_SSE_URL for frontend SSE connection configuration.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Page as ExecutionPage
    participant Hook as useExecutionMonitor
    participant SSE as SSE_Stream
    participant Stream as EventStream
    participant Blocker as BlockerEvent
    participant API as Tasks/Blockers_API

    User->>Page: open /execution/{taskId}
    Page->>Hook: start(taskId, workspacePath)
    Hook->>SSE: connect to /api/v2/tasks/{taskId}/stream
    loop streaming events
        SSE->>Hook: ExecutionEvent (progress/output/blocker/completion/heartbeat)
        Hook->>Hook: buffer event & schedule rAF flush
        Hook->>Page: batched state update (agentState, steps, message, events)
        Page->>Stream: render events list
    end
    alt Blocker event
        Stream->>Blocker: render blocker question
        User->>Blocker: submit answer
        Blocker->>API: POST /api/v2/blockers/{id}/answer
        API-->>Blocker: success
        Blocker->>Page: onAnswered callback (resume stream)
    end
    alt Stop action
        User->>Page: click Stop
        Page->>API: POST /api/v2/tasks/{id}/stop
        API-->>Page: success
        Page->>Hook: close()
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hopped to the monitor, whiskers bright,
Streams of events now dancing in light,
Blockers bowed and answers flew,
Progress bars hummed as tasks pushed through,
A carrot cheer for realtime delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Execution Monitor View with real-time SSE streaming (#332)' directly and clearly describes the main objective of the PR: implementing an Execution Monitor View with SSE streaming capability.
Linked Issues check ✅ Passed The PR comprehensively implements all primary coding requirements from issue #332: Execution Monitor UI components (EventItem variants, ExecutionHeader, ProgressIndicator, EventStream, ChangesSidebar, BatchExecutionMonitor), SSE streaming via useExecutionMonitor hook, real-time event rendering with color-coded badges, collapsible diffs/output, progress bar, auto-scroll with new events indicator, blocker form, stop/pause controls, file sidebar, and reconnection handling.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the #332 objectives: execution monitor components, streaming infrastructure, navigation integration, and supporting utilities. No extraneous modifications detected beyond the stated scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/332-execution-monitor-view

Comment @coderabbitai help to get the list of available commands and usage tips.

@macroscopeapp

macroscopeapp Bot commented Feb 6, 2026

Copy link
Copy Markdown
Contributor

Add real-time Execution Monitor views and move structured SSE events to GET /api/v2/tasks/{task_id}/stream with 15s heartbeats

Introduce a client Execution Monitor (task and batch) with SSE-driven event rendering; add GET /api/v2/tasks/{task_id}/stream for structured events and move raw output to GET /api/v2/tasks/{task_id}/output; update the SSE generator to send heartbeats every 15s and close on completion; wire UI stop/cancel actions and add blocker handling, progress display, and an execution landing page.

📍Where to Start

Start with the new SSE event endpoint handler stream_task_events in tasks_v2.py, then review the SSE generator in streaming_v2.py and the client hook useExecutionMonitor in web-ui/src/hooks/useExecutionMonitor.ts.


Macroscope summarized acd08c5.

@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review: Execution Monitor View (#332)

Overview

This is a well-architected and comprehensive implementation of real-time execution monitoring for CodeFRAME. The PR successfully bridges the navigation gap identified in #331 and delivers both single-task and batch execution monitoring with SSE streaming. The code quality is high, with excellent separation of concerns and 53 new tests.


✅ Strengths

Architecture & Design

  1. Excellent hook composition: useExecutionMonitor properly composes useTaskStream with rAF-batched state accumulation to prevent per-event re-renders. This is a performance best practice.

  2. Clean separation of concerns:

    • Event-specific components (PlanningEvent, FileChangeEvent, etc.) handle their own rendering
    • eventStyles.ts centralizes styling logic
    • API layer properly abstracted in api.ts
  3. Smart state management: The use of refs (eventsRef, rafRef, fetchedTaskIdsRef) to prevent stale closures and unnecessary re-renders shows solid React expertise.

  4. Accessibility improvements:

    • aria-label on blocker textarea (BlockerEvent.tsx:77)
    • aria-expanded on collapsible elements (BatchExecutionMonitor.tsx:277)

Implementation Quality

  1. Interrupt pattern for blockers: The inline blocker answer form with submit/loading states is well-implemented with proper error handling.

  2. Batch execution monitoring: The accordion pattern with per-task streaming is intuitive. Auto-expanding the first IN_PROGRESS task (BatchExecutionMonitor.tsx:101-107) is a nice UX touch.

  3. Completion state handling: Proper differentiation between completed, failed, and blocked states with appropriate UI feedback.

  4. Test coverage: 53 new tests covering hooks, event rendering, blocker submission, and style utilities is excellent.


🔍 Issues & Recommendations

Critical

None identified. The code review findings from the PR description appear to have been addressed.

High Priority

  1. Memory leak potential in BatchExecutionMonitor polling (BatchExecutionMonitor.tsx:89-98)

    The polling logic has a subtle issue: if fetchBatch is added to the dependency array, it will recreate the interval on every state change. Currently it's excluded (line 98), which is correct, but the ESLint disable should be more specific.

    Better approach: Use a ref for the latest fetchBatch:

    const fetchBatchRef = useRef(fetchBatch);
    fetchBatchRef.current = fetchBatch;
    
    useEffect(() => {
      const isActive = batch && !['completed', 'failed', 'cancelled'].includes(batch.status);
      if (isActive) {
        pollRef.current = setInterval(() => fetchBatchRef.current(), 5000);
      }
      return () => {
        if (pollRef.current) clearInterval(pollRef.current);
      };
    }, [batch?.status]); // Clean dependency
  2. Error boundary missing (execution/[taskId]/page.tsx)

    The execution monitor handles API errors, but there's no error boundary to catch runtime errors during event rendering. Consider wrapping the EventStream in an error boundary to prevent the entire page from crashing if a malformed event arrives.

Medium Priority

  1. Type safety: loose status typing (BatchExecutionMonitor.tsx:31-42)

    The statusConfig uses Record<string, ...> which allows any string key. Consider using a union type for stronger type safety.

  2. Connection status not reflected in UI (execution/[taskId]/page.tsx)

    The ExecutionHeader receives sseStatus but it's unclear if the connection status is visible to users. Consider adding a connection indicator (e.g., "Live", "Reconnecting", "Disconnected") in the header.

  3. Blocker answer submission race condition (BlockerEvent.tsx:28-42)

    If the API is slow and the user navigates away, the state update on line 40 may occur after unmount. Add isMounted cleanup pattern.

  4. Hard-coded polling interval (BatchExecutionMonitor.tsx:93)

    5-second polling is reasonable but should be extracted to a named constant for documentation.

Low Priority

  1. Magic string: 'blocked' completion status (useExecutionMonitor.ts:58)

    The completion status uses lowercase 'blocked' while agent states use uppercase. This inconsistency could lead to bugs. Consider normalizing.

  2. Missing null checks in event renderers

    Some event-specific components assume certain fields exist. Add defensive checks.

  3. Accessibility: keyboard navigation in batch accordion

    Consider using a roving tabindex pattern for arrow key navigation between rows.

  4. EventStream scroll behavior (execution/[taskId]/page.tsx:109-114)

    The PR description mentions "auto-scrolling event stream" but I don't see scroll-to-bottom logic in the code. Verify this is working as expected.


🔒 Security Considerations

  1. No XSS vulnerabilities identified: Event data is rendered via React (auto-escaped). Good job avoiding dangerouslySetInnerHTML.

  2. API error messages exposed to UI: The ApiError.detail is displayed directly to users (BlockerEvent.tsx:38). Ensure backend sanitizes error messages to avoid leaking sensitive info.


🚀 Performance Considerations

  1. rAF batching in useExecutionMonitor: Excellent pattern to reduce re-renders during rapid event streams.

  2. Memoization opportunities:

    • CompletionBanner component could be memoized to prevent re-renders
    • BatchTaskRow could use React.memo since it only needs to update when its specific task changes
  3. Event array growth: The events array in useExecutionMonitor grows unbounded. For long-running tasks (hours), this could cause memory issues. Consider:

    • Limiting to last N events (e.g., 1000)
    • Implementing virtualized scrolling for the event list

📝 Documentation & Testing

Strengths

  • Inline comments explain complex logic (rAF batching, polling, etc.)
  • Test coverage is comprehensive (53 new tests)
  • Component docstrings describe purpose and patterns

Suggestions

  • Add a README or documentation section explaining the execution monitor architecture
  • Document expected event flow (planning → execution → verification → completion)
  • Add integration tests for the full page, not just individual components

🎯 Summary

This is production-ready code with only minor refinements needed. The implementation demonstrates:

  • Strong React patterns (hooks, refs, memoization)
  • Good accessibility practices
  • Proper error handling
  • Excellent test coverage

Recommended Actions (Priority Order)

  1. Fix BatchExecutionMonitor polling dependency (use ref pattern)
  2. Add error boundary around EventStream
  3. Tighten type safety on status strings
  4. Add connection status indicator in UI
  5. Document event array growth limitation

Approval Status

Approved with minor suggestions. The issues identified are primarily polish items that don't block merging. Great work on this feature!


CLAUDE.md Compliance: ✅

  • Follows Phase 3 UI rebuild patterns
  • Thin adapter over core (SSE/API layer)
  • No core module imports in UI
  • Maintains headless architecture

Comment thread web-ui/src/components/ui/progress.tsx
};

function getStatusConfig(status: string) {
return statusConfig[status] ?? { icon: Loading03Icon, className: 'text-gray-400', label: status };

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

execution/BatchExecutionMonitor.tsx:41 Using bracket notation on a plain object can return inherited properties like constructor. Consider using Object.hasOwn(statusConfig, status) before the lookup to avoid prototype pollution.

Suggested change
return statusConfig[status] ?? { icon: Loading03Icon, className: 'text-gray-400', label: status };
return Object.hasOwn(statusConfig, status) ? statusConfig[status] : { icon: Loading03Icon, className: 'text-gray-400', label: status };

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment thread web-ui/src/components/execution/BatchExecutionMonitor.tsx
Comment on lines +69 to +76
for (const taskId of data.task_ids) {
if (!fetchedTaskIdsRef.current.has(taskId)) {
fetchedTaskIdsRef.current.add(taskId);
tasksApi.getOne(workspacePath, taskId).then((task) => {
setTasks((prev) => ({ ...prev, [taskId]: task }));
}).catch(() => {
// Task may have been deleted
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

execution/BatchExecutionMonitor.tsx:69 Task IDs are added to fetchedTaskIdsRef before the fetch completes. If tasksApi.getOne fails, the ID stays in the set and future polls skip it permanently. Consider adding to the set only on success (inside .then()), or removing from the set in the .catch() block.

Suggested change
for (const taskId of data.task_ids) {
if (!fetchedTaskIdsRef.current.has(taskId)) {
fetchedTaskIdsRef.current.add(taskId);
tasksApi.getOne(workspacePath, taskId).then((task) => {
setTasks((prev) => ({ ...prev, [taskId]: task }));
}).catch(() => {
// Task may have been deleted
});
for (const taskId of data.task_ids) {
if (!fetchedTaskIdsRef.current.has(taskId)) {
tasksApi.getOne(workspacePath, taskId).then((task) => {
fetchedTaskIdsRef.current.add(taskId);
setTasks((prev) => ({ ...prev, [taskId]: task }));
}).catch(() => {
// Task may have been deleted or fetch failed - will retry on next poll
});

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment thread web-ui/src/components/execution/BatchExecutionMonitor.tsx Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@web-ui/src/app/execution/`[taskId]/page.tsx:
- Around line 31-36: The effect around useEffect calling tasksApi.getOne should
not swallow errors silently; update the handler in the useEffect so that on
rejection you set an explicit error or fallback state (e.g., setTask(null) or
setTask({ notFound: true }) and/or set an error boolean like setTaskError)
instead of an empty catch; reference tasksApi.getOne, setTask and the useEffect
callback to add a .catch(err => { setTask(null); setTaskError(err || true); })
(and adjust any loading logic that relies on task being undefined) so the UI can
show an error or "task not found" message instead of "Loading task…".
- Around line 42-45: The handleStop callback calls tasksApi.stopExecution but
doesn't catch errors, causing unhandled promise rejections; update the
handleStop implementation to wrap the await
tasksApi.stopExecution(workspacePath, taskId) call in try/catch (keeping any
existing finally logic), log the error (e.g., console.error or process logger)
and surface user feedback (e.g., set an error state or trigger a toast/snackbar)
so ExecutionHeader.handleStop shows a clear failure message to the user instead
of letting the rejection bubble up.

In `@web-ui/src/app/execution/page.tsx`:
- Around line 64-83: The effect currently sets resolving to false whenever it
early-returns (including when workspacePath is still null), causing a flash of
the empty state; change the early-return logic in the useEffect so you first
bail out silently if workspacePath is falsy (just return) and only call
setResolving(false) when there are explicit params (batchId || taskIdParam), and
keep the existing tasksApi.getAll(...) flow for when workspacePath is present so
resolving is cleared only after a successful redirect or on fetch error; update
the useEffect that references workspacePath, batchId, taskIdParam, setResolving,
tasksApi.getAll and router.replace accordingly.

In `@web-ui/src/components/execution/BatchExecutionMonitor.tsx`:
- Around line 89-98: In BatchExecutionMonitor's useEffect that sets/clears
pollRef using fetchBatch, the active-check compares batch.status against
lowercase strings; update that check to use the backend's UPPERCASE enum values
(e.g., replace ['completed','failed','cancelled'] with
['COMPLETED','FAILED','CANCELLED']) so the interval stops when batch.status
becomes a terminal state; ensure any other status comparisons in this component
(e.g., the later checks that reference 'COMPLETED'/'DONE') remain consistent
with uppercase enums.

In `@web-ui/src/components/execution/ShellCommandEvent.tsx`:
- Around line 28-35: The toggle Button in the ShellCommandEvent component
doesn't expose its state to assistive tech; add an aria-expanded attribute bound
to the existing expanded state (e.g., aria-expanded={expanded}) on the Button
used with onClick={() => setExpanded(!expanded)} so screen readers know whether
the collapsible region is open; ensure the attribute updates when expanded
changes and optionally add aria-controls referencing the collapsible region's id
if present.

In `@web-ui/src/components/execution/VerificationEvent.tsx`:
- Line 19: The current pass detection uses a too-broad regex in the
VerificationEvent component (const passed = /pass/i.test(event.message ?? ''))
and can match words like "bypass" or "password"; change the regex to use word
boundaries such as /\bpass\b/i (or /^\s*pass\b/i if you want to anchor to the
start) so it only matches the standalone word "pass", keeping the
null-coalescing on event.message; update the const passed assignment to use this
new regex and keep the rest of the component unchanged.

In `@web-ui/src/hooks/useExecutionMonitor.ts`:
- Around line 88-118: The rAF flush in scheduleFlush can overwrite a recent
DISCONNECTED state because it derives agentState only from events; update
scheduleFlush (and the similar flush at the other location) to consult the
current SSE status when computing agentState: capture the latest sseStatus
(e.g., via a sseStatusRef or by adding sseStatus to the closure) and if it
indicates disconnected/closed set agentState = 'DISCONNECTED' (or preserve
DISCONNECTED) before calling deriveAgentState(findLastNonHeartbeat(...)); ensure
rafRef logic and setState merge still run but that agentState is gated by the
live SSE status so a pending rAF cannot revert a DISCONNECTED state.
🧹 Nitpick comments (16)
web-ui/__tests__/components/layout/AppSidebar.test.tsx (1)

80-88: Test name is now slightly misleading.

The test is titled "renders disabled items as non-link spans" but now also asserts that Tasks and Execution are links. Consider renaming to something like "renders enabled items as links and disabled items as spans" for clarity.

web-ui/src/components/execution/ShellCommandEvent.tsx (1)

3-4: Unused imports: CheckmarkCircle01Icon and Cancel01Icon.

These two icons are imported but never referenced in the component's JSX. They can be removed to keep imports clean.

🧹 Proposed fix
-import { CommandLineIcon, CheckmarkCircle01Icon, Cancel01Icon } from '@hugeicons/react';
+import { CommandLineIcon } from '@hugeicons/react';
web-ui/src/components/execution/FileChangeEvent.tsx (2)

22-23: Duplicated regex pattern with EventItem.isFileChangeEvent.

The prefix-stripping regex here (/^(Creating|Editing|Deleting) file:\s*/i) is semantically coupled with the detection regex in EventItem.tsx line 45 (/^(creating|editing|deleting) file:/i). If the backend message format changes, both must be updated in tandem. Consider extracting a shared constant or utility (e.g., FILE_CHANGE_PREFIX_RE) to keep them in sync.


39-44: Placeholder diff content noted — ensure follow-up.

The expanded section currently re-displays event.message as placeholder. The comment at lines 41-42 explains diff content will come from subsequent OutputEvents. This is fine for the initial implementation, but the "View Diff" label may mislead users since no actual diff is shown yet.

web-ui/src/components/execution/EventItem.tsx (1)

87-115: Consider guarding pe.step in the generic step branch.

At line 109, pe.step is rendered inside the pe.total_steps > 0 guard, but step itself could be 0 or potentially undefined depending on the backend. If step is undefined, the UI would show "Step undefined/3". This is a minor edge case, but a simple guard or default would make it more robust.

Optional fix
           {pe.total_steps > 0 && (
             <span className="mr-1.5 text-xs text-muted-foreground">
-              Step {pe.step}/{pe.total_steps}
+              Step {pe.step ?? '?'}/{pe.total_steps}
             </span>
           )}
web-ui/src/components/execution/ChangesSidebar.tsx (1)

26-29: Consider adding role and aria-label to the sidebar container.

Screen readers won't identify this region's purpose. A quick role="complementary" and aria-label="Changed files" on the outer <div> would improve discoverability.

♿ Proposed fix
     <div
+      role="complementary"
+      aria-label="Changed files"
       className={`shrink-0 rounded-lg border bg-card transition-all ${
         collapsed ? 'w-10' : 'w-64'
       }`}
     >
web-ui/src/components/execution/BatchExecutionMonitor.tsx (2)

100-107: Auto-expand effect re-runs on every poll due to batch object reference.

batch is a new object on every setBatch(data) call, so this effect fires every 5 seconds. It short-circuits via the expandedTaskId guard, so it's not broken, but you could tighten the dependency to avoid unnecessary runs.

♻️ Narrow the dependency
   useEffect(() => {
     if (!batch || expandedTaskId) return;
     const inProgress = batch.task_ids.find(
       (id) => batch.results[id] === 'IN_PROGRESS'
     );
     if (inProgress) setExpandedTaskId(inProgress);
-  }, [batch, expandedTaskId]);
+  }, [batch?.status, batch?.task_ids, expandedTaskId]);

This still re-evaluates when the batch updates meaningfully but avoids re-running on every identical poll response. Alternatively, you could JSON-stringify the results map as a dependency key.


85-87: The eslint-disable for exhaustive-deps is unnecessary.

fetchBatch is a useCallback with [workspacePath, batchId] as deps — the same values listed in this effect's dependency array. Adding fetchBatch to the deps array would be correct and eliminate the need for the suppression comment.

♻️ Proposed fix
   useEffect(() => {
     fetchBatch();
-  }, [batchId, workspacePath]); // eslint-disable-line react-hooks/exhaustive-deps
+  }, [fetchBatch]);
web-ui/__tests__/components/execution/EventItem.test.tsx (1)

174-203: Consider adding a test for completion with no files_modified field.

The success completion test assumes files_modified is present. A quick test with files_modified omitted (or empty []) would cover the fallback rendering path and guard against NPE regressions.

web-ui/src/types/index.ts (1)

125-136: BatchResponse fields should use union types to match backend enums for stronger type safety.

strategy is typed as string but BatchStrategy is already defined on Line 77 as 'serial' | 'parallel' | 'auto'. The backend enums (from codeframe/core/conductor.py) show that status has six values: 'PENDING' | 'RUNNING' | 'COMPLETED' | 'PARTIAL' | 'FAILED' | 'CANCELLED', and on_failure has two values: 'continue' | 'stop'. Replacing the string types with these union types will catch mismatches at compile time rather than runtime.

♻️ Proposed tightening
+export type BatchStatus = 'PENDING' | 'RUNNING' | 'COMPLETED' | 'PARTIAL' | 'FAILED' | 'CANCELLED';
+export type BatchOnFailure = 'continue' | 'stop';
+
 export interface BatchResponse {
   id: string;
   workspace_id: string;
   task_ids: string[];
-  status: string;
-  strategy: string;
+  status: BatchStatus;
+  strategy: BatchStrategy;
   max_parallel: number;
-  on_failure: string;
+  on_failure: BatchOnFailure;
   started_at: string | null;
   completed_at: string | null;
   results: Record<string, string>; // task_id → RunStatus
 }
web-ui/src/components/execution/ExecutionHeader.tsx (1)

85-93: Connection status dot is not accessible to screen readers.

The <span> with title is not focusable and title is not reliably announced by assistive technology. The visible text label at Line 92 mitigates this for sighted users, but consider adding role="status" and an aria-label on the containing <div> so screen readers announce connection changes.

Suggested improvement
-        <div className="flex items-center gap-1.5">
+        <div className="flex items-center gap-1.5" role="status" aria-label={`Connection: ${sseStatus}`}>
           <span
             className={cn(
               'inline-block h-2 w-2 rounded-full',
               connectionDotStyles[sseStatus] ?? connectionDotStyles.idle
             )}
-            title={`Connection: ${sseStatus}`}
+            aria-hidden="true"
           />
web-ui/src/app/execution/[taskId]/page.tsx (1)

123-195: CompletionBanner uses raw <button> elements instead of the project's Button component.

The rest of the execution UI consistently uses the shared Button primitive. Using raw <button> here loses theming consistency (focus rings, variants, etc.) and duplicates styling. Consider switching to Button with appropriate variants.

web-ui/src/hooks/useExecutionMonitor.ts (1)

130-136: Unbounded event accumulation — eventsRef grows without limit.

Each new event creates a new array via spread ([...eventsRef.current, event]). For long-running tasks with thousands of events this causes increasing memory and GC pressure. Consider capping with a sliding window or switching to push() to avoid copying the entire array on each event.

web-ui/src/lib/eventStyles.ts (3)

73-83: EXECUTING and COMPLETED share identical badge styles.

Both map to bg-green-100 text-green-800. While labels and icons differentiate them, the identical color may reduce at-a-glance scannability in a busy event stream. Consider a distinct shade (e.g., bg-emerald-100 text-emerald-800) for COMPLETED if you want terminal states to stand out visually.


36-68: Remove type assertions in switch cases—use discriminated union narrowing instead.

Lines 39 and 56 use as casts (as ProgressEvent, as CompletionEvent). Since ExecutionEvent is a discriminated union with a literal event_type on each variant, TypeScript automatically narrows the type inside each case branch. The casts can be safely removed:

case 'progress': {
  const { phase } = event;  // event is already ProgressEvent here
  // ...
}
case 'completion': {
  const { status } = event;  // event is already CompletionEvent here
  // ...
}

This improves clarity and ensures type safety is enforced by the type system rather than bypassed by assertions.


129-135: Tighten the key type for connectionDotStyles.

Record<string, string> allows any key, losing type safety. Narrow the key union to match the actual SSE status values ('idle' | 'connecting' | 'open' | 'closed' | 'error') — this will catch typos and improve autocomplete. Note: SSEStatus is already exported from hooks/useEventSource.ts and could be reused instead of defining a local type.

♻️ Suggested change
-export const connectionDotStyles: Record<string, string> = {
+import type { SSEStatus } from '@/hooks/useEventSource';
+
+export const connectionDotStyles: Record<SSEStatus, string> = {
   idle: 'bg-gray-400',
   connecting: 'bg-yellow-400 animate-pulse',
   open: 'bg-green-500',
   closed: 'bg-gray-400',
   error: 'bg-red-500',
 };

Comment thread web-ui/src/app/execution/[taskId]/page.tsx
Comment thread web-ui/src/app/execution/[taskId]/page.tsx
Comment on lines +64 to +83
useEffect(() => {
if (batchId || taskIdParam || !workspacePath) {
setResolving(false);
return;
}

tasksApi
.getAll(workspacePath, 'IN_PROGRESS')
.then((response) => {
const tasks = response.tasks ?? [];
if (tasks.length > 0) {
router.replace(`/execution/${tasks[0].id}`);
} else {
setResolving(false);
}
})
.catch(() => {
setResolving(false);
});
}, [workspacePath, batchId, taskIdParam, router]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Brief flash of "No active execution" when workspace hydrates without batch/task params.

On first render, workspacePath is null, so the effect hits the early return at Line 65 and unconditionally calls setResolving(false). When workspacePath hydrates and triggers a re-run, resolving is already false, so the component falls through to the empty state at Line 131 until the fetch completes and triggers a redirect.

Split the early-return logic so that resolving is only set to false when it's truly resolved (i.e. batch/task param is present), not when workspace hasn't loaded yet.

Proposed fix
   useEffect(() => {
-    if (batchId || taskIdParam || !workspacePath) {
+    if (batchId || taskIdParam) {
+      setResolving(false);
+      return;
+    }
+    if (!workspacePath) {
+      // Still waiting for workspace hydration — keep resolving=true
       setResolving(false);
       return;
     }

     tasksApi
       .getAll(workspacePath, 'IN_PROGRESS')
       .then((response) => {

Wait — if we simply return without setResolving(false) when !workspacePath, the initial resolving=true is preserved, so the spinner keeps showing until workspace hydrates and the fetch completes. Here's the corrected diff:

   useEffect(() => {
-    if (batchId || taskIdParam || !workspacePath) {
-      setResolving(false);
-      return;
-    }
+    if (batchId || taskIdParam) {
+      setResolving(false);
+      return;
+    }
+    if (!workspacePath) return; // keep resolving=true until workspace is ready

     tasksApi
       .getAll(workspacePath, 'IN_PROGRESS')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
if (batchId || taskIdParam || !workspacePath) {
setResolving(false);
return;
}
tasksApi
.getAll(workspacePath, 'IN_PROGRESS')
.then((response) => {
const tasks = response.tasks ?? [];
if (tasks.length > 0) {
router.replace(`/execution/${tasks[0].id}`);
} else {
setResolving(false);
}
})
.catch(() => {
setResolving(false);
});
}, [workspacePath, batchId, taskIdParam, router]);
useEffect(() => {
if (batchId || taskIdParam) {
setResolving(false);
return;
}
if (!workspacePath) return; // keep resolving=true until workspace is ready
tasksApi
.getAll(workspacePath, 'IN_PROGRESS')
.then((response) => {
const tasks = response.tasks ?? [];
if (tasks.length > 0) {
router.replace(`/execution/${tasks[0].id}`);
} else {
setResolving(false);
}
})
.catch(() => {
setResolving(false);
});
}, [workspacePath, batchId, taskIdParam, router]);
🤖 Prompt for AI Agents
In `@web-ui/src/app/execution/page.tsx` around lines 64 - 83, The effect currently
sets resolving to false whenever it early-returns (including when workspacePath
is still null), causing a flash of the empty state; change the early-return
logic in the useEffect so you first bail out silently if workspacePath is falsy
(just return) and only call setResolving(false) when there are explicit params
(batchId || taskIdParam), and keep the existing tasksApi.getAll(...) flow for
when workspacePath is present so resolving is cleared only after a successful
redirect or on fetch error; update the useEffect that references workspacePath,
batchId, taskIdParam, setResolving, tasksApi.getAll and router.replace
accordingly.

Comment thread web-ui/src/components/execution/BatchExecutionMonitor.tsx
Comment thread web-ui/src/components/execution/ShellCommandEvent.tsx
Comment thread web-ui/src/components/execution/VerificationEvent.tsx Outdated
Comment thread web-ui/src/hooks/useExecutionMonitor.ts
Resolve 17 issues from multi-angle PR review (3 critical, 7 high, 7 medium):

Critical:
- Remove duplicate SSE stream route from streaming_v2.py (route collision with tasks_v2)
- Add workspace_path to SSE EventSource URL in useTaskStream
- Add aria-live="polite" and role="log" to EventStream for screen readers

High:
- Fix batch status case mismatch (lowercase → UPPERCASE) in BatchExecutionMonitor
- Keep EventStream mounted when task is BLOCKED (enables inline blocker answers)
- Add role="alert" to CompletionBanner variants
- Add aria-expanded to FileChangeEvent and ShellCommandEvent toggle buttons
- Add responsive flex-col/md:flex-row layout for mobile
- Add dark mode variants to badge and connection dot styles
- Add task-not-found error state instead of infinite loading

Medium:
- Add try/catch to single-task handleStop
- Fix overly broad /pass/i regex → /\bpassed?\b/i in VerificationEvent
- Prevent rAF flush from overwriting DISCONNECTED agent state
- Pass value prop to ProgressPrimitive.Root for aria-valuenow
- Clear error state on successful batch fetch
- Add encodeURIComponent to new API path interpolations
- Add SSE disconnection banner with reconnect button
@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review: Execution Monitor View

This is a comprehensive implementation of real-time task execution monitoring with SSE streaming. The architecture is well-designed with clear separation of concerns. Here's my detailed feedback:


✅ Strengths

Architecture & Design

  • Excellent separation of concerns: Core components are cleanly isolated (EventStream, ExecutionHeader, ProgressIndicator, ChangesSidebar)
  • Smart performance optimization: The useExecutionMonitor hook uses requestAnimationFrame batching to prevent per-event re-renders during rapid event streams
  • Robust state management: The deriveAgentState utility centralizes agent state derivation, ensuring consistency across the UI
  • Comprehensive test coverage: 53 new tests covering hooks, components, and event rendering

Implementation Quality

  • SSE connection resilience: Proper handling of disconnections with user feedback and reconnection options
  • Accessibility: Good use of ARIA attributes (role="alert", aria-label on blocker textarea, semantic HTML)
  • Type safety: Strong TypeScript usage with discriminated unions for ExecutionEvent types
  • Event batching: The rAF-based flush mechanism in useExecutionMonitor is a well-executed optimization

🔍 Issues to Address (from CodeRabbit review)

I see CodeRabbit identified 7 actionable issues that should be addressed:

1. Error Handling in Task Fetch (execution/[taskId]/page.tsx:34)

The task fetch error is caught but silently swallowed. The current fix checks for taskError state which is good, but ensure errors are properly set:

.catch((err) => {
  setTaskError(true);
  console.error('Failed to load task:', err);
});

2. Batch Status Case Sensitivity (BatchExecutionMonitor.tsx:89-98)

The polling interval checks batch status against lowercase strings but the backend uses UPPERCASE enums. This should be:

const isActive = \!['COMPLETED', 'FAILED', 'CANCELLED'].includes(batch.status);

3. Verification Regex Too Broad (VerificationEvent.tsx:19)

The current regex /pass/i matches "bypass", "password", etc. Use word boundaries:

const passed = /\bpass\b/i.test(event.message ?? '');

4. SSE State Overwrite in rAF Flush (useExecutionMonitor.ts:88-118)

The current implementation correctly guards against overwriting DISCONNECTED state (lines 104-106), but ensure this logic is consistently applied. The implementation looks correct already.

5. Early Return in Execution Landing Page (execution/page.tsx:64-83)

The resolving state flash issue - ensure early returns don't prematurely clear the loading state when workspacePath is still null.

6. Missing aria-expanded (ShellCommandEvent.tsx:28-35)

Add aria-expanded={expanded} to the toggle button for screen reader support.

7. Unused Icon Imports (ShellCommandEvent.tsx:3-4)

Remove unused CheckmarkCircle01Icon and Cancel01Icon imports.


💡 Recommendations (Nice to Have)

Memory Management

Issue: useExecutionMonitor.ts:139 creates a new array on every event:

eventsRef.current = [...eventsRef.current, event];

For long-running tasks with thousands of events, this causes memory pressure. Consider:

  • Capping with a sliding window (e.g., last 1000 events)
  • Using .push() instead of spread for large arrays
  • Lazy loading old events on scroll

Type Safety Improvements

  1. BatchResponse types (types/index.ts:125-136): Use union types instead of string:
export type BatchStatus = 'PENDING' | 'RUNNING' | 'COMPLETED' | 'PARTIAL' | 'FAILED' | 'CANCELLED';
export interface BatchResponse {
  status: BatchStatus;
  strategy: BatchStrategy;
  on_failure: 'continue' | 'stop';
  // ...
}
  1. connectionDotStyles (eventStyles.ts:129): Narrow the Record key type to SSE status values instead of string.

  2. Remove type assertions (eventStyles.ts:39, 56): TypeScript discriminated unions make as casts unnecessary - the type is automatically narrowed in each case branch.

UI/UX Polish

  1. CompletionBanner buttons: Use the shared Button component instead of raw <button> elements for consistency
  2. Agent state color differentiation: EXECUTING and COMPLETED share identical green badge colors - consider using emerald for COMPLETED
  3. Sidebar accessibility: Add role="complementary" and aria-label="Changed files" to ChangesSidebar
  4. Connection status a11y: The status dot uses title which isn't reliably announced - add role="status" to the parent div

Testing Gaps

  • Add test for completion event with missing files_modified field
  • Test edge case where pe.step is undefined in generic step rendering

🎯 Phase 3 Considerations

Since this is v2 Web UI (Phase 3) work, verify alignment with the strategic roadmap:

  • ✅ Maintains thin adapter pattern over core (good)
  • ✅ Uses existing v2 API endpoints (tasks_v2, streaming_v2)
  • ⚠️ Ensure no logic duplication with CLI - all business logic stays in codeframe/core/

The server-side streaming_v2.py looks clean - it's a proper thin adapter over core.streaming.EventPublisher.


📋 Checklist for Author

Before merging:

  • Fix the 7 actionable issues identified above
  • Consider memory management for long-running tasks
  • Tighten TypeScript types (BatchResponse, connectionDotStyles)
  • Add missing ARIA attributes for full a11y compliance
  • Verify batch status polling uses uppercase enum values
  • Remove unused icon imports
  • Run the full test suite: cd web-ui && npm test

Summary

This is high-quality work with excellent architectural decisions, particularly the rAF batching optimization and clean component separation. The main issues are minor (case sensitivity, regex precision, accessibility enhancements) and should be straightforward to fix. Once the actionable items are addressed, this will be a solid foundation for Phase 3.

Great job bridging the navigation gap from Task Board to Execution Monitor! 🚀

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
codeframe/ui/routers/streaming_v2.py (2)

84-127: ⚠️ Potential issue | 🟠 Major

heartbeat_interval parameter is accepted but never used.

The event_stream_generator accepts heartbeat_interval (default 30s) but the function body never sends heartbeat/keep-alive messages. SSE connections will time out on proxies and load balancers that enforce idle timeouts (commonly 60s). This is likely an incomplete implementation.

A typical pattern is to race publisher.subscribe against an asyncio.sleep(heartbeat_interval) and yield a comment line on timeout:

💡 Sketch of heartbeat support
 async def event_stream_generator(
     task_id: str,
     publisher: EventPublisher,
     request: Request,
     heartbeat_interval: float = 30.0,
 ) -> AsyncGenerator[str, None]:
     logger.info(f"Starting SSE stream for task {task_id}")
+    subscription = publisher.subscribe(task_id).__aiter__()
 
     try:
-        async for event in publisher.subscribe(task_id):
-            # Check if client disconnected
+        while True:
             if await request.is_disconnected():
                 logger.info(f"Client disconnected from task {task_id} stream")
                 break
 
-            yield format_sse_event(event)
-
-            # If this is a completion event, we're done
-            if event.event_type == "completion":
-                logger.info(f"Task {task_id} completed, closing stream")
-                break
+            try:
+                event = await asyncio.wait_for(
+                    subscription.__anext__(), timeout=heartbeat_interval
+                )
+                yield format_sse_event(event)
+                if event.event_type == "completion":
+                    logger.info(f"Task {task_id} completed, closing stream")
+                    break
+            except asyncio.TimeoutError:
+                yield format_sse_comment("heartbeat")
+            except StopAsyncIteration:
+                break
 
     except asyncio.CancelledError:
         logger.info(f"SSE stream cancelled for task {task_id}")
         raise

22-25: ⚠️ Potential issue | 🟡 Minor

Remove the empty router definition; this module is utilities-only.

The router at lines 22–25 is instantiated but has zero routes and adds unnecessary overhead to the app's route matching. Since the actual SSE endpoint is in tasks_v2.py (line 726), remove the empty router definition and its mount in server.py (line 477).

Additionally, the heartbeat_interval parameter at line 88 is accepted by event_stream_generator but never used in the function body—either implement heartbeat support or remove the unused parameter.

tests/ui/test_streaming_router.py (2)

11-11: ⚠️ Potential issue | 🔴 Critical

CI failure: unused TestClient import.

The pipeline reports F401 'fastapi.testclient.TestClient' imported but unused. This is breaking CI. Remove the import.

🔧 Fix
-from fastapi.testclient import TestClient

21-69: ⚠️ Potential issue | 🟡 Minor

Remove unused fixtures and unnecessary imports.

The three fixtures mock_user, mock_workspace, and app_with_streaming are unused; no test methods reference them. Removing them eliminates unnecessary imports including codeframe.auth.User, codeframe.core.workspace.Workspace, codeframe.core.tasks, and unittest.mock.MagicMock. Additionally, TestClient imported on line 11 is not used anywhere in the file.

🤖 Fix all issues with AI agents
In `@tests/ui/test_streaming_router.py`:
- Around line 162-175: Update the stale module docstring to describe the current
scope (SSE utilities and publisher tests, noting the SSE endpoint was moved to
tasks_v2) and add the v2 pytest marker; specifically, edit the module docstring
at the top of tests/ui/test_streaming_router.py to reflect the move and current
purpose, and add a module-level pytestmark = pytest.mark.v2 (or decorate
TestStreamingRouterEndpoint with `@pytest.mark.v2`) so the tests are marked as v2;
reference the TestStreamingRouterEndpoint class and the router import to ensure
context remains accurate.

In `@web-ui/src/hooks/useTaskStream.ts`:
- Around line 112-115: The SSE URL construction in useTaskStream.ts interpolates
taskId raw into the path; update the url variable to URI-encode taskId (use
encodeURIComponent(taskId)) when building
`${apiBase}/api/v2/tasks/${taskId}/stream...` so the path segment is safe,
keeping workspacePath encoded as it is and preserving the ternary that sets url
to null when taskId or workspacePath are falsy.
🧹 Nitpick comments (8)
web-ui/src/components/execution/ChangesSidebar.tsx (1)

50-66: Consider semantic list markup for accessibility.

The file list uses plain div elements. Given the PR's emphasis on accessibility (aria-live, role attributes elsewhere), using <ul>/<li> would let screen readers announce item count and improve navigation. The h-[calc(100%-37px)] is also a fragile magic number — if the header padding changes, this breaks silently.

♻️ Suggested semantic list markup
       <ScrollArea className="h-[calc(100%-37px)]">
-          <div className="p-2">
+          <ul className="list-none p-2" role="list" aria-label="Changed files">
             {changedFiles.map((filePath) => (
-              <div
+              <li
                 key={filePath}
                 className="flex items-center gap-1.5 rounded px-2 py-1 text-xs hover:bg-muted/50"
                 title={filePath}
               >
                 <FileEditIcon className="h-3 w-3 shrink-0 text-muted-foreground" />
                 <span className="truncate font-mono">{filePath}</span>
-              </div>
+              </li>
             ))}
-          </div>
+          </ul>
         </ScrollArea>
web-ui/src/components/execution/FileChangeEvent.tsx (2)

23-29: Icon color doesn't reflect the operation type.

The icon is always green, but for "Deleting file" events a red/destructive color would be more intuitive. Consider deriving the color from the matched operation:

💡 Suggested approach
-  const filePath = event.message?.replace(/^(Creating|Editing|Deleting) file:\s*/i, '') ?? '';
+  const match = event.message?.match(/^(Creating|Editing|Deleting) file:\s*(.*)/i);
+  const operation = match?.[1]?.toLowerCase() ?? '';
+  const filePath = match?.[2] ?? event.message ?? '';
+
+  const iconColor = operation === 'deleting'
+    ? 'text-red-500'
+    : operation === 'creating'
+      ? 'text-green-600'
+      : 'text-blue-500';

Then use iconColor on the FileEditIcon:

-        <FileEditIcon className="h-4 w-4 shrink-0 text-green-600" />
+        <FileEditIcon className={`h-4 w-4 shrink-0 ${iconColor}`} />

30-45: "View Diff" label sets an expectation the placeholder content doesn't fulfill.

The comment on Lines 42–43 acknowledges this is temporary. When actual diff data becomes available, consider also adding an aria-controls attribute on the button pointing to an id on the <pre> element for better screen-reader association.

For now, a small label tweak could reduce user confusion until real diffs land:

💡 Interim label suggestion
-          {expanded ? 'Hide' : 'View Diff'}
+          {expanded ? 'Hide' : 'Details'}
web-ui/src/lib/api.ts (1)

204-221: Inconsistent taskId encoding between existing and new task methods.

stopExecution and resumeExecution use encodeURIComponent(taskId) (good), but the pre-existing startExecution (Line 183), getOne (Line 153), and updateStatus (Line 168) interpolate taskId directly without encoding. While task IDs are likely safe strings today, the inconsistency could mask a bug if IDs ever contain special characters.

Consider encoding uniformly across all tasksApi methods for consistency:

♻️ Example for startExecution
   startExecution: async (
     workspacePath: string,
     taskId: string
   ): Promise<TaskStartResponse> => {
     const response = await api.post<TaskStartResponse>(
-      `/api/v2/tasks/${taskId}/start`,
+      `/api/v2/tasks/${encodeURIComponent(taskId)}/start`,
       {},
       { params: { workspace_path: workspacePath, execute: true } }
     );
web-ui/src/hooks/useTaskStream.ts (1)

117-147: Callback stability note: handleMessage depends on all six event callbacks.

If consumers pass inline (non-memoized) callbacks, handleMessage will get a new identity every render. Whether this triggers SSE reconnections depends on useEventSource's internal dependency on onMessage. This is pre-existing behavior, but worth being aware of—callers like useExecutionMonitor should ensure callbacks are stable (e.g., via useCallback).

#!/bin/bash
# Check how useEventSource depends on onMessage — does it reconnect when onMessage changes?
ast-grep --pattern $'export function useEventSource($$$) {
  $$$
}'
web-ui/__tests__/hooks/useTaskStream.test.ts (2)

9-14: Mock doesn't capture the URL, so the new workspace_path query-param logic is untested.

The useEventSource mock ignores the url argument entirely, so no test verifies that:

  1. The constructed URL includes ?workspace_path=... when both args are provided.
  2. The URL is null when taskId is set but workspacePath is null.

Consider capturing the URL to assert on it:

💡 Suggested enhancement
 const mockClose = jest.fn();
 let capturedOnMessage: ((data: string) => void) | undefined;
+let capturedUrl: string | null | undefined;

 jest.mock('@/hooks/useEventSource', () => ({
-  useEventSource: ({ onMessage }: { url: string | null; onMessage?: (data: string) => void }) => {
+  useEventSource: ({ url, onMessage }: { url: string | null; onMessage?: (data: string) => void }) => {
+    capturedUrl = url;
     capturedOnMessage = onMessage;
     return { status: 'open' as const, close: mockClose };
   },
 }));

Then add assertions like:

it('passes null URL when workspacePath is null', () => {
  renderHook(() =>
    useTaskStream({ taskId: 'task-1', workspacePath: null })
  );
  expect(capturedUrl).toBeNull();
});

it('includes workspace_path in URL', () => {
  renderHook(() =>
    useTaskStream({ taskId: 'task-1', workspacePath: '/tmp/ws' })
  );
  expect(capturedUrl).toContain('workspace_path=%2Ftmp%2Fws');
});

22-27: Test covers taskId: null but not the workspacePath: null (with valid taskId) edge case.

The hook should produce a null URL when only workspacePath is null. Adding a test for { taskId: 'task-1', workspacePath: null } would strengthen coverage of the guard at Line 113 in the source.

codeframe/ui/routers/streaming_v2.py (1)

14-15: APIRouter import is only used for the empty router; StreamingResponse relies on downstream re-import.

APIRouter is only used to create the empty router on line 22. If the router is removed (per the earlier suggestion), this import becomes dead code. The StreamingResponse re-export (with # noqa: F401) is fine as long as consumers actually import it from here.

Comment on lines 162 to +175
class TestStreamingRouterEndpoint:
"""Tests for streaming router endpoint configuration."""
"""Tests for streaming router configuration.

def test_endpoint_exists(self, app_with_streaming):
"""The streaming endpoint should be registered."""
client = TestClient(app_with_streaming)
NOTE: The SSE stream endpoint (GET /api/v2/tasks/{task_id}/stream) was
moved to tasks_v2.py to avoid a route collision with the JWT-auth version.
The tasks_v2 version only requires workspace_path, making it compatible
with browser EventSource which cannot send custom auth headers.
"""

# Get the OpenAPI schema to verify endpoint exists
response = client.get("/openapi.json")
assert response.status_code == 200
def test_streaming_router_has_no_endpoints(self):
"""streaming_v2 router should have no endpoints (utilities only)."""
from codeframe.ui.routers.streaming_v2 import router

schema = response.json()
paths = schema.get("paths", {})

# Verify the stream endpoint is registered
assert "/api/v2/tasks/{task_id}/stream" in paths
assert "get" in paths["/api/v2/tasks/{task_id}/stream"]

def test_endpoint_returns_streaming_response(self, app_with_streaming):
"""The endpoint should return a streaming response with SSE content type."""
from codeframe.core.streaming import EventPublisher
from codeframe.ui.routers import streaming_v2

# Inject a publisher that immediately completes the task
publisher = EventPublisher()
streaming_v2.set_event_publisher(publisher)

try:
client = TestClient(app_with_streaming)

# Use stream=True but set a short timeout via the client
# and complete the task immediately
import threading
import time

def complete_task():
time.sleep(0.1)
import asyncio
loop = asyncio.new_event_loop()
loop.run_until_complete(publisher.complete_task("test-task"))
loop.close()

thread = threading.Thread(target=complete_task)
thread.start()

# The TestClient doesn't easily support streaming,
# so we just verify the endpoint starts without error
# Real streaming tests require async client
with client.stream("GET", "/api/v2/tasks/test-task/stream") as response:
assert response.status_code == 200
assert "text/event-stream" in response.headers.get("content-type", "")
# Read at most a small amount before breaking
break_after = 0
for _ in response.iter_lines():
break_after += 1
if break_after > 0:
break

thread.join(timeout=2.0)
finally:
streaming_v2.set_event_publisher(None)
assert len(router.routes) == 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

The endpoint-absence test is useful but the class docstring and module docstring are now stale.

The module docstring (lines 1-5) still says "Tests for… the /api/v2/tasks/{task_id}/stream SSE endpoint", and the class docstring mentions the move but the module-level doc doesn't reflect it. Consider updating the module docstring to match the current scope (SSE utilities + publisher tests).

Also, per coding guidelines for tests/**/*.py: tests should be marked with @pytest.mark.v2 or a module-level pytestmark = pytest.mark.v2. This file has no such marker.

💡 Suggested additions
 """Tests for SSE streaming router.
 
-TDD: Tests written first to define expected behavior of
-the /api/v2/tasks/{task_id}/stream SSE endpoint.
+Tests for shared SSE utilities (formatting, publisher management)
+and verification that the streaming_v2 router has no endpoints.
 """
+
+import pytest
+
+pytestmark = pytest.mark.v2

As per coding guidelines, tests/**/*.py: "Mark v2 tests with @pytest.mark.v2 decorator or pytestmark = pytest.mark.v2 module-level declaration."

🤖 Prompt for AI Agents
In `@tests/ui/test_streaming_router.py` around lines 162 - 175, Update the stale
module docstring to describe the current scope (SSE utilities and publisher
tests, noting the SSE endpoint was moved to tasks_v2) and add the v2 pytest
marker; specifically, edit the module docstring at the top of
tests/ui/test_streaming_router.py to reflect the move and current purpose, and
add a module-level pytestmark = pytest.mark.v2 (or decorate
TestStreamingRouterEndpoint with `@pytest.mark.v2`) so the tests are marked as v2;
reference the TestStreamingRouterEndpoint class and the router import to ensure
context remains accurate.

Comment on lines +112 to +115
const url =
taskId && workspacePath
? `${apiBase}/api/v2/tasks/${taskId}/stream?workspace_path=${encodeURIComponent(workspacePath)}`
: null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

taskId is not URI-encoded in the SSE URL path segment.

workspacePath is correctly encoded as a query parameter, but taskId is interpolated raw into the path. In api.ts, the new stopExecution/resumeExecution methods encode taskId via encodeURIComponent. The same should be done here to prevent broken URLs or path traversal if a task ID ever contains special characters (e.g., /, #, %).

🔧 Proposed fix
   const url =
     taskId && workspacePath
-      ? `${apiBase}/api/v2/tasks/${taskId}/stream?workspace_path=${encodeURIComponent(workspacePath)}`
+      ? `${apiBase}/api/v2/tasks/${encodeURIComponent(taskId)}/stream?workspace_path=${encodeURIComponent(workspacePath)}`
       : null;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const url =
taskId && workspacePath
? `${apiBase}/api/v2/tasks/${taskId}/stream?workspace_path=${encodeURIComponent(workspacePath)}`
: null;
const url =
taskId && workspacePath
? `${apiBase}/api/v2/tasks/${encodeURIComponent(taskId)}/stream?workspace_path=${encodeURIComponent(workspacePath)}`
: null;
🤖 Prompt for AI Agents
In `@web-ui/src/hooks/useTaskStream.ts` around lines 112 - 115, The SSE URL
construction in useTaskStream.ts interpolates taskId raw into the path; update
the url variable to URI-encode taskId (use encodeURIComponent(taskId)) when
building `${apiBase}/api/v2/tasks/${taskId}/stream...` so the path segment is
safe, keeping workspacePath encoded as it is and preserving the ternary that
sets url to null when taskId or workspacePath are falsy.

: 'CONNECTING';

// Don't let rAF flush overwrite DISCONNECTED state
if (sseStatusRef.current === 'error' || sseStatusRef.current === 'closed') {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

hooks/useExecutionMonitor.ts:104 The DISCONNECTED override in scheduleFlush doesn't check if the task completed. When SSE closes after a CompletionEvent, this overwrites the correct COMPLETED state with DISCONNECTED. Consider checking for completion before overriding (e.g., if (!completion && (sseStatusRef.current === 'error' || ...))).

Suggested change
if (sseStatusRef.current === 'error' || sseStatusRef.current === 'closed') {
if (!completion && (sseStatusRef.current === 'error' || sseStatusRef.current === 'closed')) {

🚀 Want me to fix this? Reply ex: "fix it for me".

// ── Event callbacks ───────────────────────────────────────────────
const onEvent = useCallback(
(event: ExecutionEvent) => {
eventsRef.current = [...eventsRef.current, event];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

hooks/useExecutionMonitor.ts:139 Spreading the entire array on every event causes O(N²) copying. Consider using push() instead for O(1) appends.

Suggested change
eventsRef.current = [...eventsRef.current, event];
eventsRef.current.push(event);

🚀 Want me to fix this? Reply ex: "fix it for me".

}, [batch?.status, fetchBatch]);

// Auto-expand the first IN_PROGRESS task
useEffect(() => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

execution/BatchExecutionMonitor.tsx:102 The auto-expand effect re-runs when expandedTaskId becomes null (after user collapse), immediately re-expanding the IN_PROGRESS task. Consider using a ref to track if auto-expansion has already occurred, so user actions aren't overridden.

🚀 Want me to fix this? Reply ex: "fix it for me".

@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review: Execution Monitor View (#332)

Overview

This is a well-architected implementation of real-time execution monitoring with SSE streaming. The PR successfully delivers both single-task and batch execution monitoring with comprehensive test coverage (53 new tests). The code demonstrates strong engineering practices with clear separation of concerns and thoughtful performance optimizations.


✅ Strengths

Architecture & Performance

  1. Smart rAF batching in useExecutionMonitor: The hook uses requestAnimationFrame to batch rapid SSE events, preventing excessive re-renders. This is exactly the right pattern for high-frequency updates (web-ui/src/hooks/useExecutionMonitor.ts:91-125).

  2. Excellent state separation: Agent state derivation logic is cleanly isolated in eventStyles.ts, making it testable and reusable across components.

  3. Proper SSE cleanup: The event_stream_generator correctly handles client disconnection and completion events (codeframe/ui/routers/streaming_v2.py:109-118).

  4. Stable polling pattern: Batch monitor uses ref-based task deduplication to prevent redundant fetches (web-ui/src/components/execution/BatchExecutionMonitor.tsx:59-79).

Testing

  1. Comprehensive test coverage: 53 new tests with excellent coverage of edge cases (heartbeat handling, state transitions, completion scenarios, error states).

  2. Well-structured test fixtures: Clean helper functions like makeProgressEvent and makeCompletionEvent make tests readable and maintainable.

UX

  1. Thoughtful accessibility: Proper ARIA labels on blocker textarea and collapsible elements.

  2. Clear visual feedback: Completion banners with contextual actions, connection status indicators, and progress tracking.


⚠️ Issues to Address

Critical

1. Unused heartbeat_interval parameter (codeframe/ui/routers/streaming_v2.py:88)

async def event_stream_generator(
    task_id: str,
    publisher: EventPublisher,
    request: Request,
    heartbeat_interval: float = 30.0,  # ⚠️ Accepted but never used
):

Impact: This is misleading - callers might think they can control heartbeat timing, but the parameter has no effect.

Recommendation: Either implement heartbeat generation or remove the parameter. If heartbeats are handled elsewhere (in the EventPublisher), document this clearly.

2. Silent error handling in task fetching (web-ui/src/app/execution/[taskId]/page.tsx:34)

tasksApi.getOne(workspacePath, taskId).then(setTask).catch(() => {
  setTaskError(true);  // ⚠️ Error details lost
});

Impact: Users see "Task not found" but developers have no way to debug why it failed.

Recommendation: Log the error or include it in state:

.catch((err) => {
  console.error('Failed to load task:', err);
  setTaskError(true);
});

Code Quality

3. Magic number: polling interval (web-ui/src/components/execution/BatchExecutionMonitor.tsx:94)

pollRef.current = setInterval(fetchBatch, 5000);  // ⚠️ Magic number

Recommendation: Extract to a named constant:

const BATCH_POLL_INTERVAL_MS = 5000;
pollRef.current = setInterval(fetchBatch, BATCH_POLL_INTERVAL_MS);

4. Potential memory leak in batch monitor (web-ui/src/components/execution/BatchExecutionMonitor.tsx:73-77)
The promise chain for tasksApi.getOne isn't cancelled when the component unmounts. If the user navigates away before these promises resolve, you'll call setTasks on an unmounted component.

Recommendation: Use AbortController or track mounted state:

const mountedRef = useRef(true);
useEffect(() => {
  return () => { mountedRef.current = false; };
}, []);

// Then in the fetch:
tasksApi.getOne(workspacePath, taskId).then((task) => {
  if (mountedRef.current) {
    setTasks((prev) => ({ ...prev, [taskId]: task }));
  }
});

5. Inconsistent error handling (web-ui/src/components/execution/BatchExecutionMonitor.tsx:80,115,124)
Some error handlers just swallow errors (.catch(() => {})), while others set error state. This makes debugging difficult.

Recommendation: Always log errors at minimum:

.catch((err) => {
  console.error('Failed to fetch task:', err);
});

💡 Suggestions (Non-blocking)

Performance

  1. Consider debouncing batch polling: 5-second polling is aggressive. When the batch has many tasks, you could increase the interval or use exponential backoff:
const pollInterval = isActive ? 
  (batch.task_ids.length > 10 ? 10000 : 5000) : 
  null;
  1. Add loading state to stop/cancel buttons: The batch monitor's stop/cancel handlers don't show loading state, so users can't tell if their click was registered.

Code Organization

  1. Extract status config to shared utilities: statusConfig in BatchExecutionMonitor.tsx:31-38 could be useful in other components. Consider moving to lib/taskUtils.ts.

  2. Type safety for event types: The deriveAgentState function would benefit from exhaustive type checking:

function deriveAgentState(event: ExecutionEvent): UIAgentState {
  switch (event.event_type) {
    case 'progress': return mapPhaseToState(event.data.phase);
    case 'error': return 'FAILED';
    case 'blocker': return 'BLOCKED';
    case 'completion': return mapCompletionToState(event.data.status);
    case 'output': return 'EXECUTING';
    case 'heartbeat': return 'CONNECTING';
    default: {
      const _exhaustive: never = event;  // Type error if case is missing
      return 'CONNECTING';
    }
  }
}

Documentation

  1. Add JSDoc to useExecutionMonitor: The hook is complex and would benefit from usage examples:
/**
 * @example
 * const monitor = useExecutionMonitor('task-123', '/path/to/workspace');
 * 
 * // Access accumulated state
 * console.log(monitor.agentState);  // 'EXECUTING'
 * console.log(monitor.events);      // All events
 * 
 * // Close connection when done
 * monitor.close();
 */

🔒 Security Review

No significant security concerns, but note:

  • ✅ API calls use workspace_path from localStorage (user-scoped)
  • ✅ No XSS vectors - React escapes all user content
  • ✅ Blocker answers are properly sanitized (trimmed)
  • ⚠️ Consider adding rate limiting to blocker submission to prevent spam

📊 Test Coverage Analysis

Coverage is excellent - 53 new tests covering:

  • ✅ rAF batching behavior
  • ✅ SSE connection lifecycle (open, error, closed)
  • ✅ All agent state transitions
  • ✅ Event accumulation and derivation
  • ✅ Blocker form submission
  • ✅ Component rendering

Minor gap: No integration test for the full SSE → hook → UI flow. Consider adding a Playwright test that:

  1. Starts a task execution
  2. Verifies events appear in the UI
  3. Answers a blocker
  4. Confirms completion banner appears

🎯 Alignment with Project Standards (CLAUDE.md)

Follows v2 architecture: Frontend properly consumes Phase 2 server APIs
No v1 dependencies: Clean break from legacy server/UI
Test-first approach: Tests define behavior contracts
Proper separation: SSE utilities in streaming_v2.py, endpoint in tasks_v2.py
⚠️ Legacy UI work: This is Phase 3 work (Web UI Rebuild), which is technically after current phase priorities, but it's well-executed and doesn't break Golden Path


Summary & Recommendation

This PR represents high-quality frontend engineering with:

  • ✅ Smart performance optimizations (rAF batching, stable refs)
  • ✅ Comprehensive test coverage (253 tests passing)
  • ✅ Thoughtful UX (accessibility, visual feedback, error states)

Recommendation: APPROVE with minor fixes

Required before merge:

  1. Fix Add Claude Code GitHub Workflow #1 (heartbeat_interval unused parameter)
  2. Fix feat: Project schema refactoring with source types and workspace management #4 (memory leak in batch monitor)

Encouraged but optional:

Great work on this implementation! The execution monitor significantly improves the developer experience for CodeFRAME users.


cc: @frankbria - Let me know if you'd like me to elaborate on any of these points or if you have questions about the recommendations.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@web-ui/src/components/execution/BatchExecutionMonitor.tsx`:
- Around line 63-82: The polling logic currently clears and sets the global
error state on every background poll, letting a transient failure replace the
whole batch UI; change fetchBatch to accept an isInitial boolean (or use an
isInitialFetchRef) and only mutate error state for the initial load: remove the
unconditional setError(null) at the top of fetchBatch, call fetchBatch(true) on
mount and fetchBatch(false) for periodic polls, and inside fetchBatch only call
setError('Failed to load batch details') when isInitial is true (and clear error
only on a successful initial fetch). Keep the task-fetching logic
(fetchedTaskIdsRef, tasksApi.getOne, setTasks) unchanged.
- Around line 102-108: The effect auto-expanding IN_PROGRESS tasks overrides a
user's manual collapse because expandedTaskId is in the dependency array; add a
ref (e.g., userInteractedRef) to track manual toggles, set
userInteractedRef.current = true inside the task toggle handler (the function
that calls setExpandedTaskId), then update the useEffect that auto-expands (the
useEffect referencing batch, expandedTaskId and calling setExpandedTaskId) to
skip auto-expand when userInteractedRef.current is true; also reset
userInteractedRef.current = false when a new batch arrives (e.g., when batch.id
changes) so auto-expand can run for new batches.
🧹 Nitpick comments (2)
web-ui/src/components/execution/BatchExecutionMonitor.tsx (2)

269-308: Events disappear when a task completes while expanded.

When status transitions from IN_PROGRESS to COMPLETED, shouldStream becomes false and the EventStream is replaced with the static "Task completed successfully." message. The user loses visibility of all accumulated events mid-view.

Consider keeping the EventStream visible (read-only) for completed tasks when the user already has the row expanded, e.g.:

- const shouldStream = isExpanded && (status === 'IN_PROGRESS' || status === 'BLOCKED');
+ const shouldStream = status === 'IN_PROGRESS' || status === 'BLOCKED';
+ const showEvents = isExpanded && (shouldStream || monitor.events.length > 0);

Then render EventStream whenever showEvents is true, so historical events remain visible after completion.


86-88: Consider adding fetchBatch to the dependency array instead of disabling the lint rule.

Since fetchBatch is a useCallback with [workspacePath, batchId] as deps, adding it to the effect's dependency array is semantically equivalent and avoids the eslint-disable comment.

  useEffect(() => {
    fetchBatch();
- }, [batchId, workspacePath]); // eslint-disable-line react-hooks/exhaustive-deps
+ }, [fetchBatch]);

Comment on lines +63 to +82
const fetchBatch = useCallback(async () => {
try {
setError(null);
const data = await batchesApi.get(workspacePath, batchId);
setBatch(data);

// Fetch task details for any new task IDs (check ref, not state)
for (const taskId of data.task_ids) {
if (!fetchedTaskIdsRef.current.has(taskId)) {
fetchedTaskIdsRef.current.add(taskId);
tasksApi.getOne(workspacePath, taskId).then((task) => {
setTasks((prev) => ({ ...prev, [taskId]: task }));
}).catch((err) => {
console.error(`Failed to fetch task ${taskId}:`, err);
});
}
}
} catch {
setError('Failed to load batch details');
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Transient poll failure replaces the entire batch view with an error message.

setError(null) on line 65 clears any previous error at the start of every poll, and a subsequent failure on line 81 sets a new error. Because the render on line 131 short-circuits to show only the error box when error is set, a single transient network hiccup during polling will hide all batch progress from the user.

Consider either:

  1. Only showing the error as a non-blocking toast/banner alongside the batch UI, or
  2. Not clearing/setting error during background polls (only on the initial fetch).
Option 2: skip error-state mutation during polling

One approach — track whether this is the initial fetch vs. a background poll, and only surface errors prominently on the initial load:

- const fetchBatch = useCallback(async () => {
+ const fetchBatch = useCallback(async (isInitial = false) => {
     try {
-      setError(null);
+      if (isInitial) setError(null);
       const data = await batchesApi.get(workspacePath, batchId);
       setBatch(data);
       // ... task fetching ...
     } catch {
-      setError('Failed to load batch details');
+      if (isInitial) setError('Failed to load batch details');
+      else console.error('Background poll failed for batch', batchId);
     }
   }, [workspacePath, batchId]);

Then in the initial-fetch effect:

   useEffect(() => {
-    fetchBatch();
+    fetchBatch(true);
   }, [batchId, workspacePath]);

Also applies to: 131-137

🤖 Prompt for AI Agents
In `@web-ui/src/components/execution/BatchExecutionMonitor.tsx` around lines 63 -
82, The polling logic currently clears and sets the global error state on every
background poll, letting a transient failure replace the whole batch UI; change
fetchBatch to accept an isInitial boolean (or use an isInitialFetchRef) and only
mutate error state for the initial load: remove the unconditional setError(null)
at the top of fetchBatch, call fetchBatch(true) on mount and fetchBatch(false)
for periodic polls, and inside fetchBatch only call setError('Failed to load
batch details') when isInitial is true (and clear error only on a successful
initial fetch). Keep the task-fetching logic (fetchedTaskIdsRef,
tasksApi.getOne, setTasks) unchanged.

Comment on lines +102 to +108
useEffect(() => {
if (!batch || expandedTaskId) return;
const inProgress = batch.task_ids.find(
(id) => batch.results[id] === 'IN_PROGRESS'
);
if (inProgress) setExpandedTaskId(inProgress);
}, [batch, expandedTaskId]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Auto-expand overrides user's manual collapse of an IN_PROGRESS task.

When the user collapses an expanded task (setting expandedTaskId to null), this effect immediately re-runs because expandedTaskId is in the dependency array. If there's still an IN_PROGRESS task, it re-expands, making it impossible for the user to collapse it.

A simple fix is to track whether the user has manually toggled, and skip auto-expand after that.

Proposed fix using a "user has interacted" ref
+ const userToggledRef = useRef(false);
+
  // Auto-expand the first IN_PROGRESS task
  useEffect(() => {
-   if (!batch || expandedTaskId) return;
+   if (!batch || expandedTaskId || userToggledRef.current) return;
    const inProgress = batch.task_ids.find(
      (id) => batch.results[id] === 'IN_PROGRESS'
    );
    if (inProgress) setExpandedTaskId(inProgress);
  }, [batch, expandedTaskId]);

Then in the toggle handler (line 237–238):

  onToggle={() => {
+   userToggledRef.current = true;
    setExpandedTaskId(expandedTaskId === taskId ? null : taskId);
  }}
🤖 Prompt for AI Agents
In `@web-ui/src/components/execution/BatchExecutionMonitor.tsx` around lines 102 -
108, The effect auto-expanding IN_PROGRESS tasks overrides a user's manual
collapse because expandedTaskId is in the dependency array; add a ref (e.g.,
userInteractedRef) to track manual toggles, set userInteractedRef.current = true
inside the task toggle handler (the function that calls setExpandedTaskId), then
update the useEffect that auto-expands (the useEffect referencing batch,
expandedTaskId and calling setExpandedTaskId) to skip auto-expand when
userInteractedRef.current is true; also reset userInteractedRef.current = false
when a new batch arrives (e.g., when batch.id changes) so auto-expand can run
for new batches.

@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review: Execution Monitor View (#332)

Overview

This is an excellent implementation of real-time execution monitoring with SSE streaming. The PR successfully delivers both single-task and batch execution monitoring with comprehensive test coverage (53 new tests, 253 total passing). The architecture demonstrates strong engineering practices, and the iterative improvements from previous review rounds show excellent responsiveness to feedback.

✅ Strengths

Architecture & Design Patterns

  • rAF batching in useExecutionMonitor (lines 91-125): Brilliant performance optimization preventing excessive re-renders during rapid event streams
  • Ref-based task deduplication in BatchExecutionMonitor (line 60): Prevents polling interval reset - smart use of refs vs state
  • Clear separation of concerns: Event parsing → state derivation → UI rendering
  • Composable hook design: useTaskStream → useExecutionMonitor → page components

State Management

  • Proper state derivation from events (not storing duplicative state)
  • Handles SSE lifecycle correctly (CONNECTING → open → error/closed → DISCONNECTED)
  • Protection against stale closures via refs (lines 103-106 in useExecutionMonitor.ts)

Accessibility

  • aria-live, role attributes on all interactive elements
  • Keyboard shortcuts (Ctrl+Enter for blocker submission)
  • Comprehensive ARIA support throughout

Error Handling

  • Try/catch on all async operations with console.error logging
  • Error states with user-friendly messages
  • Graceful degradation (task-not-found state, SSE disconnection banner)

Test Coverage

  • 253 tests passing across 24 suites
  • Comprehensive coverage: hooks, components, event formatting, utilities
  • Edge cases covered (heartbeat filtering, taskId reset, completion detection)

Backend Cleanup

  • Removed duplicate SSE route from streaming_v2.py
  • Clean utility-only module design

🔍 Areas for Consideration

1. Reconnection Logic (Medium Priority)
Location: web-ui/src/app/execution/[taskId]/page.tsx:116-126

The disconnection banner uses window.location.reload() which loses accumulated events and scroll position. SSE connections naturally reconnect on transient failures. Consider manual close() → re-mount pattern or document why reload is needed.

2. Silent Catch Handlers (Low Priority)
Locations: BatchExecutionMonitor.tsx, page.tsx

Added console.error per previous reviews, but some catch blocks don't update UI error state. Users won't know why a task row is empty. Consider inline error indicators.

3. Type Safety - Event Discrimination
Location: web-ui/src/components/execution/EventItem.tsx:20-80

Event type discrimination via string checks is safe but fragile if backend message formats change. Consider structured event subtypes from backend as a future enhancement.

🎯 CodeFRAME Alignment ✅

  • Phase 2 server layer pattern: Thin adapter over core
  • No core imports in UI: Web UI properly isolated
  • API follows v2 conventions: All endpoints use /api/v2/* prefix
  • Tests marked v2 properly
  • V2 Strategic Roadmap: Phase 3 feature built correctly on Phase 2

🛡️ Security ✅

  • encodeURIComponent used on all URL path interpolations
  • Blocker answer trimmed before submission
  • XSS protection via React defaults
  • No dangerouslySetInnerHTML or eval usage

📊 Performance

Optimizations Present ✅

  1. rAF batching prevents render thrashing
  2. Ref-based dedup prevents polling interval churn
  3. Conditional SSE connection
  4. Stable polling with useCallback

Potential Concerns

  • useExecutionMonitor accumulates all events in memory. For long-running tasks (1000+ events), consider windowing
  • If user expands 50 tasks simultaneously, 50 SSE connections open. Consider limiting concurrent streams

🎉 Summary

Verdict: Approve with minor suggestions

This PR demonstrates excellent engineering:

  • Thoughtful performance optimizations (rAF batching, ref-based dedup)
  • Comprehensive test coverage (253 tests passing)
  • Iterative improvement (6 commits addressing review findings)
  • Strong accessibility support
  • Clean architecture following v2 patterns

Minor improvements to consider:

  1. Enhance reconnection UX (avoid full page reload if possible)
  2. Add inline error states for failed task fetches
  3. Verify verification regex against actual backend messages
  4. Consider event windowing for very long-running tasks

Overall: Ready to merge. The suggestions above are non-blocking polish items that can be addressed in follow-up PRs if needed.


Reviewed by: Claude Sonnet 4.5 (code-reviewer)
Date: 2026-02-06

{isExpanded && (
<div className="border-t px-4 pb-4 pt-2">
{shouldStream ? (
<div className="h-64">

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

execution/BatchExecutionMonitor.tsx:294 EventStream uses flex-1 and h-full internally, but the parent div is a block with fixed height. Consider adding flex flex-col overflow-hidden to the wrapper so the internal layout resolves correctly and scrolling works.

Suggested change
<div className="h-64">
<div className="flex h-64 flex-col overflow-hidden">

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment on lines +65 to +68
if (batchId || taskIdParam || !workspacePath) {
setResolving(false);
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

execution/page.tsx:65 When workspacePath becomes available after hydration, resolving is already false (set during the initial mount when !workspacePath was true). Consider resetting resolving to true before starting the API call, or restructuring the condition so the fetch effect only sets resolving(false) after attempting the API call.

-    if (batchId || taskIdParam || !workspacePath) {
-      setResolving(false);
+    if (batchId || taskIdParam) {
+      setResolving(false);
+      return;
+    }
+
+    if (!workspacePath) {
       return;
     }

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment on lines +85 to +88
// Initial fetch
useEffect(() => {
fetchBatch();
}, [batchId, workspacePath]); // eslint-disable-line react-hooks/exhaustive-deps

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

execution/BatchExecutionMonitor.tsx:85 Consider resetting fetchedTaskIdsRef and tasks state when batchId changes, otherwise stale task data persists across batch switches during client-side navigation.

-  // Initial fetch
-  useEffect(() => {
-    fetchBatch();
-  }, [batchId, workspacePath]); // eslint-disable-line react-hooks/exhaustive-deps
+  // Reset cached data and fetch when batch changes
+  useEffect(() => {
+    fetchedTaskIdsRef.current.clear();
+    setTasks({});
+    fetchBatch();
+  }, [batchId, workspacePath]); // eslint-disable-line react-hooks/exhaustive-deps

🚀 Want me to fix this? Reply ex: "fix it for me".

Test User added 2 commits February 6, 2026 13:41
Three issues prevented SSE events from reaching the frontend:
- /stream endpoint sent raw text lines, not structured JSON events
- start_single_task didn't pass event_publisher to execute_agent
- Synchronous execution blocked before SSE could connect

Changes:
- Rename raw output endpoint to GET /{task_id}/output
- Add structured SSE endpoint at GET /{task_id}/stream using EventPublisher
- Run agent in background thread with event_publisher wired through
- Add 15s heartbeat comments to event_stream_generator
- Update streaming tests for renamed paths
Combine error + detail fields in structured API errors so the UI
displays "Cannot execute: No tasks ready" instead of just the summary.
@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review: Execution Monitor View with Real-Time SSE Streaming

This is a comprehensive and well-structured implementation that delivers a complete real-time execution monitoring experience. The code quality is high, with strong attention to accessibility, performance optimization, and test coverage.


Strengths

1. Excellent Architecture & Separation of Concerns

  • Clean hook composition pattern with useExecutionMonitor building on useTaskStream
  • Proper separation between event accumulation (refs) and display state (useState)
  • Smart use of requestAnimationFrame batching to prevent excessive re-renders during rapid event streams
  • Well-isolated components with clear, single responsibilities

2. Performance Optimizations

  • rAF batching in useExecutionMonitor (lines 91-125) prevents per-event renders
  • Ref-based task deduplication in BatchExecutionMonitor (fetchedTaskIdsRef, line 60) prevents redundant API calls
  • Smart polling strategy (line 90-99) that only polls active batches
  • Proper cleanup of intervals and animation frames in useEffect returns

3. Accessibility

  • Proper ARIA labels on interactive elements (aria-label on blocker textarea line 77, event stream line 80)
  • aria-live="polite" on event log (EventStream.tsx:79) for screen reader announcements
  • aria-expanded attributes on collapsible elements
  • Semantic HTML with role="log" on event stream

4. User Experience

  • Auto-scroll with smart pause on user scroll-up
  • "New events" floating button when scrolled away (EventStream.tsx:103-114)
  • Loading states with proper spinner indicators
  • Keyboard shortcuts (Ctrl/Cmd+Enter to submit blocker answers, line 81-84)
  • Clear visual feedback for connection states and agent phases

5. Error Handling

  • Try-catch blocks on all API calls
  • Graceful error messages displayed to users
  • Proper error state management with TypeScript ApiError type
  • Fallback UI states for missing workspace/task

6. Test Coverage

  • 53 new tests covering hooks, components, and edge cases
  • Mock strategy for requestAnimationFrame with queue-based flushing
  • Tests cover state derivation, completion detection, heartbeat filtering, and blocker submission
  • Good use of dispatchAndFlush helper for deterministic async testing

🔍 Areas for Consideration

1. Potential Memory Leak in Event Accumulation

Location: useExecutionMonitor.ts:139

eventsRef.current = [...eventsRef.current, event];

Issue: Events accumulate unbounded in memory. For long-running tasks with verbose output, this could grow to thousands of events.

Recommendation: Consider adding a sliding window (e.g., keep last 500 events) or implement virtual scrolling for very long event streams.

// Example fix
eventsRef.current = [...eventsRef.current, event].slice(-500);

2. XSS Risk in Event Display

Location: Event components display user/server-provided content

Issue: While React escapes strings by default, the question, context, and message fields from events are displayed directly. If these ever contain HTML or come from untrusted sources, there's potential XSS risk.

Current State: Likely safe due to React's automatic escaping, but worth verifying the backend sanitizes these fields.

Recommendation: Add explicit sanitization or use a library like DOMPurify if rich text is ever needed.

3. Race Condition in Batch Polling

Location: BatchExecutionMonitor.tsx:63-83

Issue: The fetchBatch callback is memoized with workspacePath and batchId dependencies, but the polling interval (line 94) references fetchBatch which changes on every render if those values change.

Current Impact: Minimal since workspace/batch rarely change, but could cause duplicate requests during transitions.

Recommendation: Wrap fetchBatch with useCallback and explicitly list all dependencies, or use a ref-based approach like the task deduplication pattern.

4. Missing Error Boundary

Location: Execution pages and BatchExecutionMonitor

Issue: If any component throws during render (e.g., malformed event data), the entire page crashes with no recovery.

Recommendation: Wrap the main execution monitor components in an Error Boundary that shows a "Something went wrong" message and allows retry.

5. SSE Reconnection Strategy

Location: useExecutionMonitor.ts:158-171

Issue: When SSE connection is lost (status === 'error' || 'closed'), the UI shows DISCONNECTED state but doesn't automatically attempt reconnection.

Current State: Works for manual refresh, but users might not understand they need to refresh.

Recommendation: Consider adding automatic exponential backoff reconnection, or a "Reconnect" button in the DISCONNECTED state.

6. Type Safety on Event Types

Location: eventStyles.ts:36-67

Issue: Type assertions without guards:

const { phase } = event as ProgressEvent;

Recommendation: Add runtime type guards or use discriminated unions more safely:

if (event.event_type === 'progress' && 'phase' in event) {
  const { phase } = event;
  // ...
}

🔒 Security Review

Good Practices

  • All API calls use encodeURIComponent for path parameters (api.ts:209, 218, 268, etc.)
  • User input is trimmed before submission (BlockerEvent.tsx:33)
  • Workspace path validation with graceful fallback (page.tsx:54-72)
  • No direct DOM manipulation or dangerouslySetInnerHTML

⚠️ Minor Concerns

  • Session/Token Management: Not visible in this PR, but ensure API calls include proper authentication headers
  • Rate Limiting: No client-side rate limiting on blocker submission (could be spammed, though backend likely has limits)

📊 Test Quality Assessment

Coverage: Excellent for core logic

  • ✅ Hook state management and rAF batching
  • ✅ Event accumulation and derivation
  • ✅ Blocker submission flow
  • ✅ Style utility functions

Gaps (consider adding):

  • 🔵 Error states (API failures, network errors)
  • 🔵 Edge cases (taskId changes mid-stream, duplicate events)
  • 🔵 Integration tests for page-level routing and workspace hydration
  • 🔵 Accessibility tests (aria attributes, keyboard navigation)

🎨 Code Quality & Best Practices

Excellent

  • Consistent TypeScript usage with proper type imports
  • Clear component and function naming
  • Comprehensive JSDoc comments on hooks
  • Proper use of React hooks (dependencies, cleanup)
  • Tailwind CSS classes well-organized
  • No console.logs left in production code (except intentional error logging)

🔵 Minor Suggestions

  1. Magic Numbers: Extract polling interval (5000ms) and scroll threshold (40px) to constants
  2. Duplicate Code: Status icon config appears in both BatchExecutionMonitor (line 31) and eventStyles.ts - could be centralized
  3. Component Size: BatchExecutionMonitor is 299 lines - consider extracting BatchHeader and BatchTaskRow sub-components

🚀 Performance Considerations

Strengths

  • Virtual scrolling not needed yet, but event windowing (point Add Claude Code GitHub Workflow #1) would future-proof
  • Proper memoization with useCallback on event handlers
  • No unnecessary re-renders detected in the rAF batching pattern

Watch Out For

  • Large batch executions (50+ tasks) might strain the accordion rendering - consider lazy rendering of collapsed rows
  • File change sidebar doesn't limit display count - could be hundreds of files for large migrations

📝 Documentation & Maintainability

  • ✅ Clear inline comments explaining complex logic (rAF batching, scroll detection)
  • ✅ Type definitions exported for external use
  • ✅ README-quality PR description with implementation checklist
  • 🔵 Consider adding Storybook stories for event components to aid visual testing

Final Verdict

Approval Status: ✅ Approve with Minor Suggestions

This PR demonstrates excellent engineering practices and delivers a polished feature. The identified issues are minor and mostly preventative (memory management, error boundaries) rather than critical bugs.

Recommended Actions Before Merge

  1. High Priority: Add event windowing to prevent unbounded memory growth
  2. Medium Priority: Wrap pages in Error Boundaries
  3. Low Priority: Extract magic numbers to constants

Post-Merge Follow-ups

  • Monitor production metrics for long-running task memory usage
  • Add E2E tests for full execution flow
  • Consider SSE reconnection UX improvements

Great work on this feature! The attention to accessibility, performance, and user experience really shows. 🎉

@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review - PR #339: Execution Monitor View

Overview

This PR implements a comprehensive real-time execution monitoring UI for the CodeFRAME web interface, adding SSE-based streaming, live event rendering, and batch execution monitoring. The implementation is well-architected with 19 new files, 53 new tests (253 total passing), and follows React/Next.js best practices.


✅ Strengths

1. Excellent Architecture & Separation of Concerns

  • Clean hook composition: useExecutionMonitor properly composes useTaskStream with requestAnimationFrame batching to prevent render thrashing
  • Event-driven design: The use of SSE for real-time updates is appropriate and well-implemented
  • Centralized styling: eventStyles.ts provides a single source of truth for UI state mapping
  • Type safety: Strong TypeScript usage throughout with proper type definitions

2. Performance Optimizations

  • rAF batching prevents excessive re-renders when events arrive rapidly
  • Ref-based accumulation avoids stale closure issues
  • Memoization with useCallback prevents unnecessary re-renders

3. Robust Error Handling

  • Blocker submission includes try/catch with user-friendly error messages
  • Batch execution has error boundaries for stop/cancel operations
  • SSE disconnection handling with reconnection UI
  • API error normalization handles multiple FastAPI error formats

4. Accessibility

  • Proper ARIA labels on interactive elements (aria-label, aria-live, aria-expanded)
  • Semantic HTML with role="log" and role="alert"
  • Keyboard shortcuts (Ctrl+Enter) for blocker submission

5. Test Coverage

  • 53 new tests covering hooks, components, and utilities
  • Tests use proper mocking strategies (rAF queue, SSE status)
  • Good coverage of edge cases (disconnection, completion states)

🔍 Issues & Recommendations

High Priority

1. Potential XSS Vulnerability in Event Rendering 🔴

Location: Multiple event components (FileChangeEvent, ShellCommandEvent, VerificationEvent)

Issue: If backend event data contains user-controlled content (file paths, shell commands, error messages), rendering them directly could expose XSS risks.

Recommendation:

  • Ensure all dynamic content is properly escaped (React does this by default for text content, but verify any use of dangerouslySetInnerHTML)
  • Sanitize file paths and command outputs before display
  • Add Content Security Policy headers

2. Missing Error Boundaries 🟡

Location: EventStream.tsx - event rendering

Issue: If any event fails to render, the entire stream crashes.

Recommendation: Add error boundary around EventItem to prevent cascading failures

3. Race Condition in Batch Polling 🟡

Location: BatchExecutionMonitor.tsx:90-99

Issue: fetchBatch is recreated on every render due to dependencies, causing the effect to run unnecessarily. This could lead to multiple concurrent intervals.

Recommendation: Remove fetchBatch from the useEffect dependency array and only depend on batch?.status to prevent duplicate intervals.


Medium Priority

4. Memory Leak: Unbounded Event Accumulation 🟡

Location: useExecutionMonitor.ts:139

Issue: Events accumulate indefinitely. For long-running tasks, this could consume significant memory.

Recommendation: Implement a sliding window (MAX_EVENTS = 1000) to limit memory usage

5. Type Safety: Status String Literals 🟡

Location: BatchExecutionMonitor.tsx:147, eventStyles.ts:56-60

Issue: Using string literals for status checks instead of type unions.

Recommendation: Use TypeScript const assertions for type-safe status checks

6. Missing Loading States 🟡

Location: ExecutionPage.tsx:47-54

Issue: No loading indicator during stop operation. User gets no feedback.

Recommendation: Add loading state during async operations


Low Priority

7. Console.error Instead of Proper Error Reporting 🔵

Multiple instances of console.error without user notification at:

  • ExecutionPage.tsx:35, 52
  • BatchExecutionMonitor.tsx:76

Recommendation: Use toast notifications or error states to inform users.

8. Hardcoded Magic Numbers 🔵

  • EventStream.tsx:56 - < 40 threshold for "at bottom" detection
  • BatchExecutionMonitor.tsx:94 - 5000ms polling interval

Recommendation: Extract to named constants

9. Accessibility: Focus Management 🔵

Location: BlockerEvent.tsx

When a blocker appears, focus should automatically move to the textarea for keyboard users.


🔒 Security Considerations

  1. API Authentication: Verify all SSE endpoints require authentication (check backend implementation)
  2. Rate Limiting: Consider rate limiting blocker submissions to prevent abuse
  3. Input Validation: Ensure backend validates blocker answers before processing
  4. CORS: Verify CORS configuration allows only trusted origins

🧪 Testing Recommendations

Add Integration Tests For:

  1. SSE reconnection - Test behavior when connection drops mid-execution
  2. Concurrent batch execution - Multiple tasks streaming simultaneously
  3. Memory limits - Long-running tasks with thousands of events
  4. Error recovery - Network failures during blocker submission

📊 Performance Considerations

Good:

  • ✅ rAF batching prevents render thrashing
  • ✅ Memoized callbacks with proper dependencies
  • ✅ Refs used to avoid stale closures

Monitor:

  • ⚠️ EventStream re-renders on every event (expected, but monitor performance)
  • ⚠️ Batch polling creates network overhead (consider WebSocket upgrade)

✨ Final Verdict

Overall Assessment: ⭐⭐⭐⭐ (4/5)

This is a well-crafted implementation with strong architecture, good test coverage, and thoughtful UX. The primary concerns are around potential XSS vulnerabilities, missing error boundaries, and the batch polling race condition.

Recommendation: Approve with Changes

Must Fix Before Merge:

  1. Add error boundary around event rendering
  2. Fix batch polling race condition
  3. Verify XSS protection in backend

Can Address in Follow-up:

  1. Memory management for long-running executions
  2. Toast notifications for errors
  3. Focus management for blockers
  4. Additional integration tests

Great work on this feature! The execution monitor significantly improves the user experience for real-time task monitoring. 🚀

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
codeframe/ui/routers/streaming_v2.py (1)

22-25: ⚠️ Potential issue | 🟡 Minor

Remove the unused router object or move the stream endpoint to streaming_v2.py.

The APIRouter is mounted in server.py:477 but has no routes registered on it—all routes are defined on tasks_v2.router instead. This adds unnecessary overhead and contradicts the guideline that "streaming router must provide /api/v2/tasks/{id}/stream endpoint." Either remove this router entirely (since the module now serves only as SSE utilities) or relocate the stream_task_events endpoint from tasks_v2.py:855 here.

🤖 Fix all issues with AI agents
In `@codeframe/ui/routers/streaming_v2.py`:
- Around line 143-147: Add a public unsubscribe method on EventPublisher (e.g.,
EventPublisher.unsubscribe(subscription)) and use it from the finally block
instead of touching publisher._lock and publisher._subscribers directly;
implement unsubscribe to take care of acquiring the internal lock, safely
locating the subscriber list for subscription.task_id (use get to avoid
KeyError), remove the subscription if present, and delete the task_id key only
if the list is empty (avoiding recreating behavior for defaultdicts), so both
the setup (where subscriptions are added) and teardown use the same safe, public
API.
- Around line 107-114: The code is directly importing and using the private
_Subscription class and touching publisher._lock and publisher._subscribers;
replace that with the public EventPublisher.subscribe(task_id) call and let the
publisher manage queue/lock/subscriber lifecycle. Call
publisher.subscribe(task_id) to obtain the subscription object (or subscription
handle) and then implement HTTP-specific behavior (heartbeat using
heartbeat_interval, request disconnection monitoring and cleanup) on top of that
returned handle rather than mutating publisher._subscribers or publisher._lock
or importing _Subscription. Ensure you remove direct references to
_Subscription, publisher._lock, and publisher._subscribers and use the public
subscribe/unsubscribe APIs for cleanup.

In `@codeframe/ui/routers/tasks_v2.py`:
- Around line 855-896: Add the standard rate limit decorator to the SSE
endpoint: apply `@rate_limit_standard`() directly above the async def
stream_task_events(...) function (same style as used on
stream_task_output_lines) so the long-lived Server-Sent Events connection is
subject to the same request limits; ensure the decorator is properly imported or
already available in the module and keep the decorator placement immediately
above the function definition.
- Around line 604-626: The current _run_agent fire-and-forget uses a daemon
thread and doesn't manage lifecycle or protect the error-publishing calls; fix
by 1) replacing the daemon behavior with tracked threads: before starting, check
a shared registry (e.g., active_agent_threads keyed by task_id) to avoid
duplicate runs and register the Thread instance
(threading.Thread(target=_run_agent)) so it can be joined on shutdown; 2) make
the thread non-daemon and ensure you remove the entry from active_agent_threads
when finished (in a finally block) so state stays consistent; and 3) harden the
exception handler inside _run_agent: wrap publisher.publish_sync and
publisher.complete_task_sync in their own try/except blocks (log any failures
with logger.error including exception info and error_type), and ensure a final
attempt to mark the task complete or failed (using task_id and ErrorEvent) in a
finally section so runtime.execute_agent failures cannot leave tasks IN_PROGRESS
and publishing errors are surfaced.
🧹 Nitpick comments (3)
web-ui/__tests__/lib/api.test.ts (1)

1-68: Consider adding unit tests for blockersApi, batchesApi, and tasksApi.stopExecution/resumeExecution.

The test file currently only covers normalizeErrorDetail. The new API methods lack dedicated unit tests that verify correct URL construction, parameter passing, and error handling — especially the encodeURIComponent behavior on user-provided IDs (blockerId, batchId, taskId) in endpoint paths.

codeframe/ui/routers/tasks_v2.py (1)

598-602: Inline imports from streaming_v2 and core.models inside the endpoint handler.

These imports are deferred to avoid circular dependencies, but it makes the dependency graph opaque. Consider whether the circular dependency can be resolved at the module level (e.g., by restructuring imports or using a shared interface). If the inline import is necessary, a brief comment explaining why would help future maintainers.

tests/ui/test_v2_routers_integration.py (1)

748-779: Good 404 coverage, but no integration tests for happy-path streaming or background execution.

The renamed/new tests correctly validate 404 responses for both /output and /stream endpoints. However, the following scenarios introduced in this PR lack integration test coverage:

  1. start_single_task with execute=True — the background thread execution path (lines 598-629 in tasks_v2.py) is untested.
  2. stream_task_events for an existing task — no test verifies that a 200 with text/event-stream content type is returned when the task exists.
  3. Behavioral difference: /output returns 404 when no run exists, but /stream returns 200 (streaming). This asymmetry isn't tested or documented.

These are understandable to defer (PR summary mentions "manual interactive checks remain"), but worth tracking.

Do you want me to generate integration test stubs for the background execution and structured event streaming happy paths, or open an issue to track this?

Comment on lines +107 to +114
queue: asyncio.Queue = asyncio.Queue(maxsize=1000)
loop = asyncio.get_running_loop()

from codeframe.core.streaming import _Subscription
subscription = _Subscription(task_id, queue, loop)

async with publisher._lock:
publisher._subscribers[task_id].append(subscription)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the full streaming_v2.py file to see the context around lines 107-114 and 143-147
fd streaming_v2.py --type f

Repository: frankbria/codeframe

Length of output: 99


🏁 Script executed:

# Find the core streaming module
fd streaming.py --type f | grep core

Repository: frankbria/codeframe

Length of output: 154


🏁 Script executed:

# Read streaming_v2.py to see context around lines 107-114 and 143-147
wc -l codeframe/ui/routers/streaming_v2.py

Repository: frankbria/codeframe

Length of output: 103


🏁 Script executed:

# Read core/streaming.py to check EventPublisher implementation
wc -l codeframe/core/streaming.py

Repository: frankbria/codeframe

Length of output: 94


🏁 Script executed:

# Read the full streaming_v2.py file
cat -n codeframe/ui/routers/streaming_v2.py

Repository: frankbria/codeframe

Length of output: 5950


🏁 Script executed:

# Read core/streaming.py to find EventPublisher class definition
cat -n codeframe/core/streaming.py | head -200

Repository: frankbria/codeframe

Length of output: 7011


🏁 Script executed:

# Find the EventPublisher class definition in core/streaming.py
ast-grep --pattern $'class EventPublisher {
  $$$
}'

Repository: frankbria/codeframe

Length of output: 45


🏁 Script executed:

# Also search for public subscribe methods
rg -n 'def subscribe|def unsubscribe|async def subscribe|async def unsubscribe' codeframe/core/streaming.py -C3

Repository: frankbria/codeframe

Length of output: 1059


🏁 Script executed:

# Read the full EventPublisher class definition and methods
sed -n '290,350p' codeframe/core/streaming.py

Repository: frankbria/codeframe

Length of output: 2551


🏁 Script executed:

# Read the unsubscribe method
sed -n '472,490p' codeframe/core/streaming.py

Repository: frankbria/codeframe

Length of output: 814


🏁 Script executed:

# Also read the _Subscription class to understand its structure
sed -n '200,290p' codeframe/core/streaming.py

Repository: frankbria/codeframe

Length of output: 2928


🏁 Script executed:

# Check how subscribe() is being used elsewhere in the codebase
rg -n 'publisher\.subscribe\|\.subscribe\(' --type=py -C2 | head -100

Repository: frankbria/codeframe

Length of output: 45


🏁 Script executed:

# Check if there are any other places accessing EventPublisher internals
rg -n 'publisher\._lock|publisher\._subscribers|_Subscription' --type=py -C2

Repository: frankbria/codeframe

Length of output: 3523


🏁 Script executed:

# Let's see the full subscribe() method implementation to understand what it returns
sed -n '319,365p' codeframe/core/streaming.py

Repository: frankbria/codeframe

Length of output: 2019


🏁 Script executed:

# Check if there are other SSE/streaming handlers that might use subscribe() properly
rg -n 'async def.*stream|AsyncGenerator' codeframe/ui/routers/ --type=py

Repository: frankbria/codeframe

Length of output: 432


🏁 Script executed:

# Check how stream_task_events in tasks_v2.py implements streaming
sed -n '856,920p' codeframe/ui/routers/tasks_v2.py

Repository: frankbria/codeframe

Length of output: 2063


Use the public EventPublisher.subscribe() method instead of directly accessing private internals.

The router directly imports _Subscription and accesses publisher._lock and publisher._subscribers (lines 110-114, 143-147), breaking encapsulation and violating the thin adapter pattern. EventPublisher already exposes a public subscribe(task_id) method that handles subscription management, queue creation, and cleanup. The HTTP-specific concerns (heartbeats via heartbeat_interval, request disconnection detection) should be layered on top of the public API rather than reimplementing subscription logic by reaching into private state.

🤖 Prompt for AI Agents
In `@codeframe/ui/routers/streaming_v2.py` around lines 107 - 114, The code is
directly importing and using the private _Subscription class and touching
publisher._lock and publisher._subscribers; replace that with the public
EventPublisher.subscribe(task_id) call and let the publisher manage
queue/lock/subscriber lifecycle. Call publisher.subscribe(task_id) to obtain the
subscription object (or subscription handle) and then implement HTTP-specific
behavior (heartbeat using heartbeat_interval, request disconnection monitoring
and cleanup) on top of that returned handle rather than mutating
publisher._subscribers or publisher._lock or importing _Subscription. Ensure you
remove direct references to _Subscription, publisher._lock, and
publisher._subscribers and use the public subscribe/unsubscribe APIs for
cleanup.

Comment on lines +143 to +147
async with publisher._lock:
if subscription in publisher._subscribers[task_id]:
publisher._subscribers[task_id].remove(subscription)
if not publisher._subscribers[task_id]:
del publisher._subscribers[task_id]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Cleanup duplicates the private-API coupling from subscription setup.

The finally block again accesses publisher._lock and publisher._subscribers directly. If a public unsubscribe(subscription) method existed on EventPublisher, both the setup (lines 113-114) and teardown would be cleaner and less fragile.

Also note: if publisher._subscribers is a defaultdict, the del on line 147 removes the key, but a subsequent append (from another connection) would silently recreate it. If it's a plain dict, accessing a non-existent task_id in line 144 could KeyError if another coroutine already deleted the key between the _lock acquisition and the in check — though the async lock should prevent this within a single event loop.

🤖 Prompt for AI Agents
In `@codeframe/ui/routers/streaming_v2.py` around lines 143 - 147, Add a public
unsubscribe method on EventPublisher (e.g.,
EventPublisher.unsubscribe(subscription)) and use it from the finally block
instead of touching publisher._lock and publisher._subscribers directly;
implement unsubscribe to take care of acquiring the internal lock, safely
locating the subscriber list for subscription.task_id (use get to avoid
KeyError), remove the subscription if present, and delete the task_id key only
if the list is empty (avoiding recreating behavior for defaultdicts), so both
the setup (where subscriptions are added) and teardown use the same safe, public
API.

Comment on lines +604 to +626
def _run_agent():
try:
runtime.execute_agent(
workspace,
run,
dry_run=dry_run,
verbose=verbose,
event_publisher=publisher,
)
except Exception as exc:
logger.error(f"Background agent failed for task {task_id}: {exc}", exc_info=True)
publisher.publish_sync(
task_id,
ErrorEvent(
task_id=task_id,
error=str(exc),
error_type=type(exc).__name__,
),
)
publisher.complete_task_sync(task_id)

thread = threading.Thread(target=_run_agent, daemon=True)
thread.start()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Daemon thread for agent execution is fire-and-forget with no lifecycle management.

Several concerns with this approach:

  1. Daemon thread: If the server process shuts down, the agent is killed mid-execution with no cleanup, potentially leaving tasks in an inconsistent state (e.g., IN_PROGRESS forever).
  2. No thread tracking: The spawned thread isn't stored anywhere — there's no way to check if execution is already running, join it on shutdown, or enforce one-agent-per-task.
  3. Nested exception risk: If publisher.publish_sync or publisher.complete_task_sync on lines 615-623 themselves raise, the error is silently swallowed (no outer try/catch inside _run_agent's except block).

For (3), consider wrapping the error-publishing calls:

Proposed defensive fix for error publishing
             except Exception as exc:
                 logger.error(f"Background agent failed for task {task_id}: {exc}", exc_info=True)
-                publisher.publish_sync(
-                    task_id,
-                    ErrorEvent(
-                        task_id=task_id,
-                        error=str(exc),
-                        error_type=type(exc).__name__,
-                    ),
-                )
-                publisher.complete_task_sync(task_id)
+                try:
+                    publisher.publish_sync(
+                        task_id,
+                        ErrorEvent(
+                            task_id=task_id,
+                            error=str(exc),
+                            error_type=type(exc).__name__,
+                        ),
+                    )
+                    publisher.complete_task_sync(task_id)
+                except Exception as pub_exc:
+                    logger.error(
+                        f"Failed to publish error event for task {task_id}: {pub_exc}",
+                        exc_info=True,
+                    )

Items (1) and (2) are broader architectural concerns that may be acceptable for an initial implementation but should be tracked.

🤖 Prompt for AI Agents
In `@codeframe/ui/routers/tasks_v2.py` around lines 604 - 626, The current
_run_agent fire-and-forget uses a daemon thread and doesn't manage lifecycle or
protect the error-publishing calls; fix by 1) replacing the daemon behavior with
tracked threads: before starting, check a shared registry (e.g.,
active_agent_threads keyed by task_id) to avoid duplicate runs and register the
Thread instance (threading.Thread(target=_run_agent)) so it can be joined on
shutdown; 2) make the thread non-daemon and ensure you remove the entry from
active_agent_threads when finished (in a finally block) so state stays
consistent; and 3) harden the exception handler inside _run_agent: wrap
publisher.publish_sync and publisher.complete_task_sync in their own try/except
blocks (log any failures with logger.error including exception info and
error_type), and ensure a final attempt to mark the task complete or failed
(using task_id and ErrorEvent) in a finally section so runtime.execute_agent
failures cannot leave tasks IN_PROGRESS and publishing errors are surfaced.

Comment on lines +855 to +896
@router.get("/{task_id}/stream")
async def stream_task_events(
request: Request,
task_id: str,
workspace: Workspace = Depends(get_v2_workspace),
) -> StreamingResponse:
"""Stream structured execution events for a task via SSE.

Returns Server-Sent Events with JSON-formatted ExecutionEvent payloads.
Compatible with browser EventSource (no custom auth headers required).

Event types (in data.event_type):
- ``progress``: Phase/step transitions
- ``output``: stdout/stderr lines
- ``blocker``: Human input needed
- ``completion``: Task finished (success/failure/blocked)
- ``error``: Execution error
- ``heartbeat``: Keep-alive

For raw text output lines (cf work follow equivalent),
use GET /{task_id}/output instead.
"""
task = tasks.get(workspace, task_id)
if not task:
raise HTTPException(
status_code=404,
detail=api_error("Task not found", ErrorCodes.NOT_FOUND, f"No task with id {task_id}"),
)

from codeframe.ui.routers.streaming_v2 import event_stream_generator, get_event_publisher

publisher = get_event_publisher()

return StreamingResponse(
event_stream_generator(task_id, publisher, request),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"X-Accel-Buffering": "no",
},
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing @rate_limit_standard() on the structured event stream endpoint.

stream_task_output_lines (line 750) has @rate_limit_standard(), but stream_task_events does not. This is an SSE endpoint that holds a long-lived connection and consumes server resources — without rate limiting, a client can open unbounded connections per task or across tasks.

Proposed fix
 `@router.get`("/{task_id}/stream")
+@rate_limit_standard()
 async def stream_task_events(
🤖 Prompt for AI Agents
In `@codeframe/ui/routers/tasks_v2.py` around lines 855 - 896, Add the standard
rate limit decorator to the SSE endpoint: apply `@rate_limit_standard`() directly
above the async def stream_task_events(...) function (same style as used on
stream_task_output_lines) so the long-lived Server-Sent Events connection is
subject to the same request limits; ensure the decorator is properly imported or
already available in the module and keep the decorator placement immediately
above the function definition.

Next.js rewrite proxy buffers chunked responses, which prevents SSE
events from streaming in real-time. EventSource now connects directly
to the FastAPI backend via NEXT_PUBLIC_SSE_URL (defaults to
http://localhost:8000).

Also quiets expected stop-button errors when no active run exists.
@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review - Execution Monitor PR #339

Summary

This PR delivers a comprehensive real-time execution monitoring system with SSE streaming, live event display, and inline blocker handling. The implementation is well-architected with proper separation of concerns between hooks, components, and API layers.

✅ Strengths

1. Excellent Architecture & Separation of Concerns

  • Clean separation between useTaskStream (raw SSE connection) and useExecutionMonitor (state accumulation)
  • Proper use of requestAnimationFrame batching to prevent render thrashing during rapid event streams
  • Server-side SSE utilities properly isolated in streaming_v2.py with shared utilities
  • Event type system with comprehensive TypeScript types matching backend models

2. Performance Optimizations

  • useExecutionMonitor uses refs + rAF batching to avoid per-event re-renders (lines 52-125 in useExecutionMonitor.ts)
  • Proper cleanup of rAF on unmount (lines 128-134)
  • SSE heartbeat mechanism prevents proxy timeouts (streaming_v2.py:88-134)
  • Queue-based event distribution with proper subscription management

3. Robust State Management

  • Agent state derivation properly maps backend events to UI states (eventStyles.ts:36-68)
  • Proper handling of SSE disconnection/reconnection scenarios
  • Completion status properly tracked with terminal states
  • Task deduplication in batch monitor via refs (prevents stale closure issues)

4. Excellent UX Patterns

  • Inline blocker answering with proper loading states and keyboard shortcuts (Ctrl/Cmd+Enter)
  • Stop confirmation dialog prevents accidental termination
  • Connection status indicator with visual feedback
  • Completion banners with appropriate actions
  • Auto-scrolling event stream with proper scroll detection

5. Comprehensive Test Coverage

  • 53 new tests covering hooks, components, event rendering, and styles
  • Backend SSE formatting tests in test_streaming_router.py
  • Integration tests for the full stack
  • Proper mock setup for icons and API calls

🔍 Issues & Recommendations

1. Security & Input Validation

Medium Priority: The blocker answer submission lacks input validation:

// web-ui/src/components/execution/BlockerEvent.tsx:28
const handleSubmit = async () => {
  if (!answer.trim() || isSubmitting) return;
  // No max length check or sanitization
  await blockersApi.answer(workspacePath, String(event.blocker_id), answer.trim());

Recommendation: Add max length validation and sanitize input:

const MAX_ANSWER_LENGTH = 10000;
const handleSubmit = async () => {
  const trimmed = answer.trim();
  if (!trimmed || isSubmitting || trimmed.length > MAX_ANSWER_LENGTH) return;
  await blockersApi.answer(workspacePath, String(event.blocker_id), trimmed);
};

2. Error Handling

Low Priority: SSE connection errors could be more user-friendly:

// web-ui/src/hooks/useTaskStream.ts:144-146
} catch {
  // Ignore malformed messages (e.g. SSE comments)
}

Recommendation: Log to console in development for debugging:

} catch (err) {
  if (process.env.NODE_ENV === 'development') {
    console.warn('Failed to parse SSE event:', data, err);
  }
}

3. Memory Leak Potential

Low Priority: EventPublisher subscription cleanup in streaming_v2.py:142-148 has a potential race condition if multiple subscriptions are removed concurrently.

Recommendation: The current implementation with async context manager is good, but consider adding a try-finally around the subscription list manipulation for extra safety.

4. Accessibility

Low Priority: Some dynamic content lacks ARIA live regions:

  • Event stream updates aren't announced to screen readers
  • Progress updates lack aria-live regions

Recommendation: Add to EventStream component:

<div role="log" aria-live="polite" aria-label="Execution events">

5. Type Safety

Low Priority: In useTaskStream.ts:113, the SSE URL construction could be typed more strictly:

const sseBase = process.env.NEXT_PUBLIC_SSE_URL || 'http://localhost:8000';

Recommendation: Validate URL format and provide type guard.

📋 Alignment with CLAUDE.md

✅ Follows v2 Architecture:

  • Server layer properly acts as thin adapter over core (streaming_v2.py delegates to core.streaming)
  • No FastAPI imports in core modules
  • Proper event emission via EventPublisher

✅ Phase 2 Deliverables:

  • Real-time SSE streaming implemented ✅
  • Integration with existing task execution flow ✅
  • Proper workspace-based routing ✅

⚠️ Minor Deviation:
The PR description mentions this is Phase 3 work (Web UI), but CLAUDE.md shows Phase 2 (Server Layer) as complete and Phase 3 as planned. This is actually advancing Phase 3, which is good progress, but should be noted in the roadmap docs.

🎯 Code Quality Assessment

Aspect Rating Notes
Architecture ⭐⭐⭐⭐⭐ Excellent separation, proper layering
Performance ⭐⭐⭐⭐⭐ rAF batching, proper memoization
Type Safety ⭐⭐⭐⭐ Good TypeScript usage, minor improvements possible
Error Handling ⭐⭐⭐⭐ Comprehensive, could be more user-friendly
Test Coverage ⭐⭐⭐⭐⭐ 53 new tests, comprehensive scenarios
Accessibility ⭐⭐⭐ Basic support, could use ARIA enhancements
Documentation ⭐⭐⭐⭐ Good inline comments, clear component docs

✅ Final Verdict

APPROVE with minor suggestions

This is high-quality work that delivers the core execution monitoring functionality with proper architecture and performance characteristics. The identified issues are minor and can be addressed in follow-up PRs or as inline improvements.

Key Achievements:

  • Complete SSE streaming infrastructure from backend to frontend
  • Sophisticated state management with rAF batching
  • Inline blocker handling closes the human-in-the-loop gap
  • Batch execution monitoring supports the conductor workflow
  • Excellent test coverage (253 passing tests)

Suggested Follow-ups (not blockers):

  1. Add input validation to blocker answers (security hardening)
  2. Enhance accessibility with ARIA live regions
  3. Improve SSE error logging for development
  4. Update V2_STRATEGIC_ROADMAP.md to reflect Phase 3 progress

🧪 Testing Recommendations

Before merging, verify:

  • SSE reconnection behavior when backend restarts
  • Multiple concurrent task streams (memory leak check)
  • Blocker answer submission with network delays
  • Stop button behavior during different agent phases
  • Browser back/forward navigation from execution page

Great work on this feature! The execution monitor will significantly improve the developer experience. 🚀

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@web-ui/src/app/execution/`[taskId]/page.tsx:
- Around line 26-29: getSelectedWorkspacePath() currently guards SSR but calls
localStorage.getItem() without protection; modify getSelectedWorkspacePath (the
function referenced by useEffect in page.tsx) to wrap any localStorage access in
a try/catch and return null on error so workspace hydration via
setWorkspacePath(getSelectedWorkspacePath()) won't throw in incognito/restricted
contexts; ensure the catch swallows or logs the error and returns null (do not
change the useEffect), preserving existing behavior when window is undefined or
item is missing.
🧹 Nitpick comments (3)
web-ui/src/hooks/useTaskStream.ts (1)

119-149: Unstable handleMessage reference may trigger SSE reconnections.

handleMessage depends on all six callback props (onEvent, onProgress, onOutput, onBlocker, onComplete, onError). If any caller passes inline arrow functions (not wrapped in useCallback), handleMessage gets a new identity every render, which likely causes useEventSource to tear down and re-establish the SSE connection.

Consider using a useRef to hold the latest callbacks, so handleMessage has a stable identity:

♻️ Sketch using refs for stable identity
+ const callbacksRef = useRef({ onEvent, onProgress, onOutput, onBlocker, onComplete, onError });
+ useEffect(() => {
+   callbacksRef.current = { onEvent, onProgress, onOutput, onBlocker, onComplete, onError };
+ });
+
  const handleMessage = useCallback(
    (data: string) => {
      try {
        const event = JSON.parse(data) as ExecutionEvent;
        setLastEvent(event);
-       onEvent?.(event);
+       callbacksRef.current.onEvent?.(event);
        switch (event.event_type) {
          case 'progress':
-           onProgress?.(event);
+           callbacksRef.current.onProgress?.(event);
            break;
          // … same for other cases
        }
      } catch { }
    },
-   [onEvent, onProgress, onOutput, onBlocker, onComplete, onError]
+   [] // stable — reads latest callbacks from ref
  );
web-ui/src/app/execution/[taskId]/page.tsx (2)

70-77: Use <Link> instead of <a> for internal navigation consistency.

Line 74 uses a raw <a href="/tasks"> for internal navigation, while line 86 uses Next.js <Link>. Using <Link> enables client-side navigation without a full page reload.

♻️ Proposed fix
-        <a href="/tasks" className="text-blue-600 hover:underline">Back to Task Board</a>
+        <Link href="/tasks" className="text-blue-600 hover:underline">Back to Task Board</Link>

61-64: No-op callback could be replaced with a comment or removed.

handleBlockerAnswered is an empty useCallback that does nothing. The inline comment explains why, which is good. However, if EventStream's onBlockerAnswered prop is optional, you could skip passing it entirely and let the comment live near line 147 instead.

Comment on lines +26 to +29
useEffect(() => {
setWorkspacePath(getSelectedWorkspacePath());
setWorkspaceReady(true);
}, []);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg "getSelectedWorkspacePath" --type ts --type tsx -B 2 -A 10

Repository: frankbria/codeframe

Length of output: 90


🏁 Script executed:

rg "getSelectedWorkspacePath" -A 10 -B 2

Repository: frankbria/codeframe

Length of output: 18007


🏁 Script executed:

cat web-ui/src/lib/workspace-storage.ts

Repository: frankbria/codeframe

Length of output: 2638


Workspace hydration should handle localStorage access errors.

getSelectedWorkspacePath() at line 26 checks for SSR context (typeof window === 'undefined') but does not wrap localStorage.getItem() in a try/catch. In incognito/private mode or restricted iframe contexts, localStorage can throw an error, which would propagate as an unhandled exception. Wrap the call in try/catch to gracefully return null on access errors.

🤖 Prompt for AI Agents
In `@web-ui/src/app/execution/`[taskId]/page.tsx around lines 26 - 29,
getSelectedWorkspacePath() currently guards SSR but calls localStorage.getItem()
without protection; modify getSelectedWorkspacePath (the function referenced by
useEffect in page.tsx) to wrap any localStorage access in a try/catch and return
null on error so workspace hydration via
setWorkspacePath(getSelectedWorkspacePath()) won't throw in incognito/restricted
contexts; ensure the catch swallows or logs the error and returns null (do not
change the useEffect), preserving existing behavior when window is undefined or
item is missing.

Comment on lines +164 to +167
const agentState =
!prev.isCompleted && (sseStatus === 'error' || sseStatus === 'closed')
? 'DISCONNECTED'
: prev.agentState;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

hooks/useExecutionMonitor.ts:164 When SSE reconnects (sseStatus becomes 'open'), agentState remains 'DISCONNECTED' because the code just preserves prev.agentState. Consider re-deriving agentState from the last event when the connection recovers.

Suggested change
const agentState =
!prev.isCompleted && (sseStatus === 'error' || sseStatus === 'closed')
? 'DISCONNECTED'
: prev.agentState;
const agentState =
!prev.isCompleted && (sseStatus === 'error' || sseStatus === 'closed')
? 'DISCONNECTED'
: prev.agentState === 'DISCONNECTED' && sseStatus === 'open'
? (findLastNonHeartbeat(eventsRef.current) ? deriveAgentState(findLastNonHeartbeat(eventsRef.current)!) : 'CONNECTING')
: prev.agentState;

🚀 Want me to fix this? Reply ex: "fix it for me".

The execution monitor was stuck on "Connecting" because:
1. Heartbeats were SSE comments (ignored by EventSource onmessage)
2. Already-completed tasks had no events to deliver (EventPublisher
   drops events when no subscribers exist)

Now the /stream endpoint:
- Sends an initial ProgressEvent immediately (transitions UI from
  CONNECTING to EXECUTING)
- For tasks with a terminal run (COMPLETED/FAILED/BLOCKED), emits a
  synthetic CompletionEvent so the UI shows the final state
- For active tasks, delegates to the live event_stream_generator
@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review - Recent Updates to PR #339

I've reviewed the latest changes (3 commits since the last review) and the overall PR implementation. The recent fixes have addressed critical SSE streaming issues and improved the user experience significantly.


✅ Recent Fixes Reviewed

1. Error Context Enhancement (commit 762d111)

Shows full error context in normalizeErrorDetail - good improvement for debugging.

2. SSE Direct Backend Connection (commit 97147c9)

CRITICAL FIX: Bypassing Next.js proxy for SSE is the correct approach. Next.js rewrites buffer chunked responses, breaking SSE streaming. The implementation at useTaskStream.ts:111-117 properly uses NEXT_PUBLIC_SSE_URL to connect directly to FastAPI.

Excellent architectural decision - this is exactly how SSE should be handled with Next.js.

3. Initial SSE Event & Terminal State Handling (commit d61f7d9)

MAJOR UX IMPROVEMENT: The backend now sends an initial ProgressEvent (tasks_v2.py:906-915) to immediately transition the client from CONNECTING state. This prevents UI flicker and provides instant feedback.

The terminal state detection (tasks_v2.py:894-934) elegantly handles the case where a task has already completed before the client connects, sending a synthetic completion event instead of waiting indefinitely.


🔍 Detailed Code Quality Assessment

Strengths ⭐⭐⭐⭐⭐

  1. EventStream Accessibility (EventStream.tsx:78-80)

    • Added role="log", aria-live="polite", and aria-label as suggested in previous reviews
    • Proper semantic HTML for screen readers
  2. Clean SSE Event Handling

    • Heartbeat filtering in EventStream (line 33) keeps the UI clean
    • Proper SSE event parsing with error tolerance (useTaskStream.ts:144-146)
  3. Performance Optimization

    • rAF batching in useExecutionMonitor prevents render thrashing during rapid event bursts
    • Ref-based event accumulation (lines 53-54) avoids unnecessary re-renders
  4. State Management

    • Clear separation between raw SSE events (useTaskStream) and derived UI state (useExecutionMonitor)
    • Proper cleanup on unmount and task changes (useExecutionMonitor.ts:72-88)
  5. Backend Architecture

    • streaming_v2.py properly isolates SSE utilities as shared infrastructure
    • event_stream_generator (lines 84-149) has robust error handling and cleanup
    • Subscription management with async context manager prevents leaks

🐛 Issues Found (Minor - Non-Blocking)

1. Type Safety - Medium Priority

Location: useTaskStream.ts:113

Issue: No validation that NEXT_PUBLIC_SSE_URL is a valid URL format. Malformed URLs will cause runtime failures.

Recommendation: Add URL validation helper function to validate format and provide better error messages in development.

2. Error Logging - Low Priority

Location: useTaskStream.ts:144-146

Issue: Silent error suppression makes debugging difficult. Previous review suggested logging in development mode.

Recommendation: Add console.warn in development mode to help debug SSE parsing issues.

3. Race Condition in Subscription Cleanup - Low Priority

Location: streaming_v2.py:142-148

Issue: The current code is safe but could be more defensive with an additional check before deleting from subscribers dict.

4. Environment Variable Documentation - Low Priority

Issue: .env.example should document NEXT_PUBLIC_SSE_URL with clear instructions about when it's needed.


📋 Alignment with CLAUDE.md

V2 Architecture Compliance:

  • Server layer acts as thin adapter over core (streaming_v2.py delegates to core.streaming)
  • No FastAPI imports in core modules
  • Proper event emission via EventPublisher
  • Router delegates to core.tasks, core.runtime modules

Phase 2 Complete, Phase 3 In Progress:
This PR advances Phase 3 (Web UI Rebuild) with production-ready execution monitoring.


🎯 Final Assessment

Aspect Rating Notes
Architecture ⭐⭐⭐⭐⭐ Exemplary SSE direct-to-backend pattern
Recent Fixes ⭐⭐⭐⭐⭐ All three commits address real issues effectively
Error Handling ⭐⭐⭐⭐ Comprehensive, could improve dev logging
Type Safety ⭐⭐⭐⭐ Strong overall, URL validation would improve
Performance ⭐⭐⭐⭐⭐ rAF batching, proper ref usage
Accessibility ⭐⭐⭐⭐⭐ ARIA regions added as suggested
Test Coverage ⭐⭐⭐⭐⭐ 253 passing tests, comprehensive
Documentation ⭐⭐⭐⭐ Inline comments good, .env.example needs update

✅ Recommendation: APPROVE

The recent fixes have resolved critical issues (SSE buffering, connection state, terminal task handling) and the code quality is excellent. The identified issues are minor and can be addressed in follow-up PRs.

Key Achievements Since Last Review:

  • ✅ Fixed SSE streaming by bypassing Next.js proxy (critical bug fix)
  • ✅ Added initial event to prevent UI flicker (great UX improvement)
  • ✅ Handles already-completed tasks gracefully (edge case handled)
  • ✅ Added ARIA live regions for accessibility (previous feedback addressed)

Suggested Follow-ups (Not Blockers):

  1. Add SSE URL validation helper with error logging
  2. Improve development mode error logging for SSE parsing
  3. Update .env.example with SSE configuration docs
  4. Add defensive check in subscription cleanup
  5. Update V2_STRATEGIC_ROADMAP.md to reflect Phase 3 progress

Great work on this comprehensive feature! The execution monitor significantly improves the developer experience and the implementation quality is production-ready. 🚀

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@codeframe/ui/routers/tasks_v2.py`:
- Around line 897-938: Race condition: subscribe to the publisher before or
re-check run state after subscribing so CompletionEvent isn't missed; modify
_generate in tasks_v2.py to (1) call event_stream_generator (or otherwise
subscribe to EventPublisher) before relying on get_active_run/get_latest_run, or
(preferred) after you obtain the subscription, re-query
runtime.get_latest_run(task_id) (and/or runtime.get_active_run) inside _generate
and if that newly fetched latest_run is terminal emit a synthetic
CompletionEvent via format_sse_event(CompletionEvent(...)) before entering the
async for loop over event_stream_generator so any completion that happened
during subscription is delivered to the client; reference functions/vars:
get_active_run, get_latest_run, already_terminal, _generate,
event_stream_generator, EventPublisher, CompletionEvent, format_sse_event.

Comment on lines +897 to +938
run = runtime.get_active_run(workspace, task_id)
latest_run = runtime.get_latest_run(workspace, task_id) if not run else None
already_terminal = (
latest_run is not None
and latest_run.status in (RunStatus.COMPLETED, RunStatus.FAILED, RunStatus.BLOCKED)
) if not run else False

async def _generate():
# Always send an initial progress event so the browser's EventSource
# fires onmessage and the client transitions from CONNECTING state.
yield format_sse_event(
ProgressEvent(
task_id=task_id,
phase="connected",
step=0,
total_steps=0,
message="Stream connected",
)
)

if already_terminal:
# Task already finished — emit a synthetic completion event.
status_map = {
RunStatus.COMPLETED: "completed",
RunStatus.FAILED: "failed",
RunStatus.BLOCKED: "blocked",
}
duration = 0.0
if latest_run.started_at and latest_run.completed_at:
duration = (latest_run.completed_at - latest_run.started_at).total_seconds()
yield format_sse_event(
CompletionEvent(
task_id=task_id,
status=status_map[latest_run.status],
duration_seconds=duration,
)
)
return

# Live stream — subscribe to the EventPublisher for real-time events.
async for chunk in event_stream_generator(task_id, publisher, request):
yield chunk

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Race condition: task can complete between the active-run check and the publisher subscription.

If the agent finishes (and publishes its CompletionEvent + END_OF_STREAM) after get_active_run returns a non-None run (line 897) but before event_stream_generator subscribes to the publisher (line 937), the client will never receive the completion event. The EventPublisher has no replay buffer, so those events are lost, and the stream will hang indefinitely on heartbeats.

Consider one of:

  1. Subscribe to the publisher before checking the run status, so no events are missed during the check.
  2. After subscribing, re-check the run status inside _generate() and emit a synthetic completion if the run became terminal while connecting.
  3. Add a bounded replay buffer to EventPublisher so late subscribers see recent events.
Sketch of option 2 (re-check after subscribe)
     async def _generate():
         yield format_sse_event(
             ProgressEvent(
                 task_id=task_id,
                 phase="connected",
                 step=0,
                 total_steps=0,
                 message="Stream connected",
             )
         )

         if already_terminal:
             # ... existing synthetic completion logic ...
             return

-        async for chunk in event_stream_generator(task_id, publisher, request):
-            yield chunk
+        # Subscribe first, then re-check to close the race window
+        queue = asyncio.Queue(maxsize=1000)
+        loop = asyncio.get_running_loop()
+        from codeframe.core.streaming import _Subscription
+        subscription = _Subscription(task_id, queue, loop)
+        async with publisher._lock:
+            publisher._subscribers[task_id].append(subscription)
+
+        # Re-check: did the run complete while we were setting up?
+        recheck_run = runtime.get_active_run(workspace, task_id)
+        if recheck_run is None:
+            latest = runtime.get_latest_run(workspace, task_id)
+            if latest and latest.status in (RunStatus.COMPLETED, RunStatus.FAILED, RunStatus.BLOCKED):
+                # Emit synthetic completion and bail out
+                yield format_sse_event(
+                    CompletionEvent(task_id=task_id, status=status_map[latest.status], duration_seconds=0.0)
+                )
+                # Clean up subscription
+                async with publisher._lock:
+                    if subscription in publisher._subscribers[task_id]:
+                        publisher._subscribers[task_id].remove(subscription)
+                return
+
+        # Proceed with normal event streaming using the already-established subscription
+        async for chunk in event_stream_generator(task_id, publisher, request, _subscription=subscription):
+            yield chunk
🤖 Prompt for AI Agents
In `@codeframe/ui/routers/tasks_v2.py` around lines 897 - 938, Race condition:
subscribe to the publisher before or re-check run state after subscribing so
CompletionEvent isn't missed; modify _generate in tasks_v2.py to (1) call
event_stream_generator (or otherwise subscribe to EventPublisher) before relying
on get_active_run/get_latest_run, or (preferred) after you obtain the
subscription, re-query runtime.get_latest_run(task_id) (and/or
runtime.get_active_run) inside _generate and if that newly fetched latest_run is
terminal emit a synthetic CompletionEvent via
format_sse_event(CompletionEvent(...)) before entering the async for loop over
event_stream_generator so any completion that happened during subscription is
delivered to the client; reference functions/vars: get_active_run,
get_latest_run, already_terminal, _generate, event_stream_generator,
EventPublisher, CompletionEvent, format_sse_event.

Comment on lines +897 to +934
run = runtime.get_active_run(workspace, task_id)
latest_run = runtime.get_latest_run(workspace, task_id) if not run else None
already_terminal = (
latest_run is not None
and latest_run.status in (RunStatus.COMPLETED, RunStatus.FAILED, RunStatus.BLOCKED)
) if not run else False

async def _generate():
# Always send an initial progress event so the browser's EventSource
# fires onmessage and the client transitions from CONNECTING state.
yield format_sse_event(
ProgressEvent(
task_id=task_id,
phase="connected",
step=0,
total_steps=0,
message="Stream connected",
)
)

if already_terminal:
# Task already finished — emit a synthetic completion event.
status_map = {
RunStatus.COMPLETED: "completed",
RunStatus.FAILED: "failed",
RunStatus.BLOCKED: "blocked",
}
duration = 0.0
if latest_run.started_at and latest_run.completed_at:
duration = (latest_run.completed_at - latest_run.started_at).total_seconds()
yield format_sse_event(
CompletionEvent(
task_id=task_id,
status=status_map[latest_run.status],
duration_seconds=duration,
)
)
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Never-executed task falls through to an indefinite live stream with no events.

When there's no active run and no latest run (task was created but never started), already_terminal is False and the generator subscribes to the publisher. Since no agent will ever publish events for this task, the client receives only heartbeats indefinitely.

Consider returning a synthetic completion/error event (or an HTTP 409/404) for tasks that have never been executed, so the client doesn't hang.

+    never_started = run is None and latest_run is None
+
     async def _generate():
         yield format_sse_event(
             ProgressEvent(
                 task_id=task_id,
                 phase="connected",
                 step=0,
                 total_steps=0,
                 message="Stream connected",
             )
         )

+        if never_started:
+            yield format_sse_event(
+                CompletionEvent(
+                    task_id=task_id,
+                    status="failed",
+                    duration_seconds=0.0,
+                    message="No execution run exists for this task",
+                )
+            )
+            return
+
         if already_terminal:

Test User added 2 commits February 6, 2026 16:09
When the server sends a completion event and closes the stream, the
EventSource retry counter was always 0 (reset by onmessage) causing
infinite reconnection. Now useTaskStream closes the EventSource
immediately upon receiving a completion event.
@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review: Execution Monitor with Real-time SSE Streaming

This is a well-structured implementation that delivers a comprehensive execution monitoring experience. The code quality is generally high with good test coverage.

✅ Strengths

  1. Excellent hook composition with RAF batching to prevent render storms
  2. Comprehensive test coverage - 53 new tests
  3. Good accessibility with proper ARIA attributes
  4. Clean architecture with separation of concerns
  5. Proper SSE implementation with heartbeat keep-alive
  6. Good error handling throughout

🔴 Critical Issues

1. Memory Leak in Batch Monitor Polling
File: web-ui/src/components/execution/BatchExecutionMonitor.tsx:94

The polling interval cleanup may not fire properly when batch status changes rapidly. If batch.status transitions while an interval is active, a new interval starts without clearing the old one.

Fix: Clear the existing interval before starting a new one in the useEffect body.

2. Unhandled Promise Rejections
File: web-ui/src/components/execution/BatchExecutionMonitor.tsx:72-78

Task fetch failures are silently logged but not shown to the user, potentially leaving missing task names in the UI.

Fix: Set an error state or show a placeholder when promises reject.

🟡 Important Issues

3. Missing Error Boundary
Files: web-ui/src/app/execution pages

The execution pages lack error boundaries to catch runtime errors in child components.

4. Race Condition in SSE Connection
File: web-ui/src/hooks/useExecutionMonitor.ts:103-106

Potential race where sseStatusRef.current is checked before the sseStatus state update completes.

Fix: Use sseStatus from state closure inside setState instead of checking the ref.

🔵 Minor Improvements

  • Add useMemo for displayEvents filter (EventStream.tsx:33)
  • Use button elements for accordion keyboard navigation
  • Replace console.error with proper logging service
  • Extract magic number 5000 to BATCH_POLL_INTERVAL_MS constant
  • Extract inline button styles to design system

✅ Security & Best Practices

  • XSS mitigation via React automatic escaping ✅
  • Proper rate limiting on streaming endpoints ✅
  • SSE connection leak prevention ✅

🧪 Test Coverage

Excellent coverage (253 tests passing). Consider adding tests for:

  • Rapid state transitions with RAF batching
  • Stale closure prevention
  • SSE reconnection behavior
  • Memory leak scenarios

🎯 Summary

Overall: APPROVE with recommendations

This successfully bridges the navigation gap from issue 331 and delivers a polished execution monitoring experience. The critical issues are fixable and do not block merging, but should be addressed soon after merge.

Priority fixes:

  1. Memory leak in batch polling (Critical)
  2. Unhandled promise rejections (Critical)
  3. Error boundary (Important)

Estimated fix effort: 1-2 hours

Great work on the comprehensive test coverage and clean architecture!

@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Code Review: Execution Monitor with Real-time SSE Streaming

This is an impressive implementation of a real-time execution monitoring system. The code demonstrates excellent architecture, proper separation of concerns, and comprehensive testing. Here's my detailed review:


Strengths

1. Architecture & Design Patterns

  • Clean separation of concerns: Backend streaming utilities properly extracted from endpoint logic
  • requestAnimationFrame batching: Smart optimization in useExecutionMonitor prevents per-event re-renders during rapid event streams
  • Stable polling via refs: Avoids stale closure bugs by using useRef for task deduplication
  • Heartbeat mechanism: 15s heartbeat interval in SSE prevents proxy/browser timeouts - well documented

2. Backend Implementation

# streaming_v2.py - Excellent cleanup
  • Properly refactored from monolithic endpoint to shared utilities module
  • Direct _Subscription usage with manual queue management is more explicit than the old subscribe() generator approach
  • Heartbeat implementation via asyncio.wait_for timeout is elegant
  • Proper cleanup in finally block prevents subscriber leaks

3. Frontend Implementation

  • useExecutionMonitor hook: Well-designed state accumulation with clear public interface
  • EventStream component: Smart auto-scroll behavior with manual override and "New events" button
  • Accessibility: Good use of ARIA attributes (aria-label, aria-expanded, role="log")
  • Error handling: try/catch on batch operations, graceful handling of stop/cancel failures

4. Test Coverage

  • 53 new tests with 253 total passing
  • Comprehensive coverage: hooks, components, API layer, event styling
  • Good use of test fixtures and helper functions
  • Mock strategy for requestAnimationFrame is well thought out

🔍 Areas for Consideration

1. Backend: Direct _Subscription Access (Minor)

# streaming_v2.py:110-114
from codeframe.core.streaming import _Subscription
subscription = _Subscription(task_id, queue, loop)

async with publisher._lock:
    publisher._subscribers[task_id].append(subscription)

Concern: Directly accessing _Subscription (private class, indicated by leading underscore) and manually managing publisher._subscribers breaks encapsulation.

Suggestion: Consider adding a public method to EventPublisher:

# In core/streaming.py
class EventPublisher:
    def subscribe_with_queue(self, task_id: str, queue: asyncio.Queue) -> _Subscription:
        """Create and register a subscription with a provided queue."""
        subscription = _Subscription(task_id, queue, self.loop)
        async with self._lock:
            self._subscribers[task_id].append(subscription)
        return subscription

This would:

  • Maintain encapsulation
  • Make the intent clearer
  • Allow internal implementation changes without breaking streaming_v2.py

2. Frontend: Race Condition in useExecutionMonitor (Minor)

// useExecutionMonitor.ts:104-106
if (sseStatusRef.current === 'error' || sseStatusRef.current === 'closed') {
  agentState = 'DISCONNECTED';
}

Potential issue: The rAF flush callback reads sseStatusRef.current, but this ref is updated asynchronously in a separate effect (line 153-155). There's a tiny window where the ref might not reflect the latest SSE status when the rAF callback executes.

Impact: Low - this would only cause a brief visual inconsistency (wrong badge color for 1 frame).

Suggestion: Could be eliminated by:

const scheduleFlush = useCallback(() => {
  if (rafRef.current !== null) return;
  rafRef.current = requestAnimationFrame(() => {
    rafRef.current = null;
    const events = [...eventsRef.current];
    const lastNonHeartbeat = findLastNonHeartbeat(events);
    const currentSSEStatus = sseStatusRef.current; // Capture once
    
    let agentState: UIAgentState = lastNonHeartbeat
      ? deriveAgentState(lastNonHeartbeat)
      : 'CONNECTING';
    
    if (currentSSEStatus === 'error' || currentSSEStatus === 'closed') {
      agentState = 'DISCONNECTED';
    }
    // ... rest of logic
  });
}, []); // Remove sseStatusRef from deps - it's a ref

3. Error Handling: Blocker Answer Submission (Documentation)

// [taskId]/page.tsx:47-58
const handleStop = useCallback(async () => {
  try {
    await tasksApi.stopExecution(workspacePath, taskId);
  } catch (err) {
    // Silently ignore 400/404
  }
}, [workspacePath, taskId]);

Observation: This pattern appears in multiple places where API errors are intentionally swallowed. While the comments explain this is expected for already-completed runs, users might not see meaningful error messages if something truly goes wrong.

Suggestion: Consider:

  • Emitting these "expected errors" to a debug console/logging service
  • Adding a subtle toast notification for unexpected errors
  • Or explicitly document this UX decision in the component docstring

4. Performance: Event Filtering in EventStream (Very Minor)

// EventStream.tsx:33
const displayEvents = events.filter((e) => e.event_type !== 'heartbeat');

Observation: This filter runs on every render. With thousands of events, this could become expensive.

Suggestion: Consider memoization:

const displayEvents = useMemo(
  () => events.filter((e) => e.event_type !== 'heartbeat'),
  [events]
);

🔒 Security Considerations

All good here:

  • API key authentication properly enforced (inherited from parent routers)
  • No XSS vulnerabilities (React escapes by default)
  • No direct DOM manipulation that could introduce injection
  • Rate limiting applies to streaming endpoints via existing configuration

🎯 Alignment with CLAUDE.md

Excellent adherence to project guidelines:

  • Follows Phase 2 (Server Layer) thin adapter pattern
  • Core streaming logic in codeframe/core/streaming.py (headless)
  • UI properly separated in web-ui/ with no coupling to core
  • Tests follow v2 marker convention
  • No legacy UI dependencies introduced

📝 Minor Nitpicks

  1. Typo in streaming_v2.py docstring (line 7):

    • Current: "The actual SSE endpoint for tasks is in tasks_v2.py"
    • Clarification: Might be helpful to mention this change in the module-level comment to explain why the endpoint was moved
  2. Magic number in EventStream.tsx:56:

    const isAtBottom = scrollHeight - scrollTop - clientHeight < 40;

    Consider extracting as const SCROLL_THRESHOLD = 40; for clarity


🎉 Summary

This is excellent work. The implementation is well-architected, thoroughly tested, and properly integrated with the existing codebase. The issues I've identified are all minor and don't block merging:

  • The _Subscription direct access is a small encapsulation concern but works correctly
  • The race condition in useExecutionMonitor has negligible practical impact
  • The performance optimizations are micro-optimizations that can be done later if needed

Recommendation: ✅ Approve and merge

The 253 passing tests, proper accessibility attributes, and clean separation of concerns demonstrate this is production-ready code. The execution monitor will significantly improve the developer experience for real-time task monitoring.


Great job on this PR! The SSE implementation is particularly well done with the heartbeat mechanism and graceful disconnection handling.

@frankbria frankbria merged commit aff25e5 into main Feb 6, 2026
11 checks passed
@frankbria frankbria deleted the feature/332-execution-monitor-view branch February 6, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 3] Execution Monitor View - Real-time Agent Monitoring

1 participant